jellAIfish / jellyfish

This repository is inspired by Quinn Liu's repository Walnut.
4 stars 4 forks source link

Convolutional Neural Networks for Sentence Classification #46

Closed markroxor closed 6 years ago

markroxor commented 6 years ago

https://arxiv.org/pdf/1408.5882.pdf

markroxor commented 6 years ago

Our work is philosophically similar to Razavian et al. (2014) which showed that for image clas- sification, feature extractors obtained from a pre- trained deep learning model perform well on a va- riety of tasks—including tasks that are very dif- ferent from the original task for which the feature extractors were trained.

markroxor commented 6 years ago

screenshot from 2018-01-18 18-25-46

markroxor commented 6 years ago

Let xi 2 Rk be the k-dimensional word vector corresponding to the i-th word in the sentence. A sentence of length n (padded where necessary) is represented as x1:n = x1 ⊕ x2 ⊕ : : : ⊕ xn; (1) where ⊕ is the concatenation operator. In general, let xi:i+j refer to the concatenation of words xi; xi+1; : : : ; xi+j. A convolution operation involves a filter w 2 Rhk, which is applied to a window of h words to produce a new feature. For example, a feature ci is generated from a window of words xi:i+h−1 by ci = f(w · xi:i+h−1 + b): (2) Here b 2 R is a bias term and f is a non-linear function such as the hyperbolic tangent. This filter is applied to each possible window of words in the sentence fx1:h; x2:h+1; : : : ; xn−h+1:ng to produce a feature map c = [c1; c2; : : : ; cn−h+1]

We have described the process by which one feature is extracted from one filter. The model uses multiple filters (with varying window sizes) to obtain multiple features. These features form the penultimate layer and are passed to a fully connected softmax layer whose output is the probability distribution over labels.

markroxor commented 6 years ago

3.3 Model Variations We experiment with several variants of the model. • CNN-rand: Our baseline model where all words are randomly initialized and then modified during training. • CNN-static: A model with pre-trained vectors from word2vec. All words— including the unknown ones that are randomly initialized—are kept static and only the other parameters of the model are learned. • CNN-non-static: Same as above but the pretrained vectors are fine-tuned for each task. • CNN-multichannel: A model with two sets of word vectors. Each set of vectors is treated as a ‘channel’ and each filter is applied to both channels, but gradients are backpropagated only through one of the channels. Hence the model is able to fine-tune one set of vectors while keeping the other static. Both channels are initialized with word2vec.