vsl9 / Sentiment-Analysis-with-Convolutional-Networks

Convolutional Neural Network model for Sentiment Analysis of IMDB movie reviews
64 stars 37 forks source link

Why indices when creating sentences? #2

Closed dhruvparamhans closed 8 years ago

dhruvparamhans commented 9 years ago

Hello there,

I was going through the way you constructed the sentence matrix and then I didn't understand one things. Why did you take the indices of the words in the vocabulary while creating the sentence matrices? I would have imagined that as input for the convolutional neural network, we would be constructing matrices out of the word vectors for each word in the review. Or am i missing something here?

And I am sorry to have posted this question as an issue. I would have contacted you by email but I couldn't find any contact information.

Thanks for your help

vsl9 commented 9 years ago

Hi Dhruv,

Why did you take the indices of the words in the vocabulary while creating the sentence matrices? I would have imagined that as input for the convolutional neural network, we would be constructing matrices out of the word vectors for each word in the review.

Right, the convolutional layer needs input data to be presented as a matrix per review where each row contains a word vector. But what if we'll put an additional layer which transforms word indices into the word vectors before the convolutional layer? Actually, such layer exists in Keras and it's called Embedding. It could be seen as a simple look-up table. But it has one important property: it allows us to fine-tune (train) the word vectors by backpropagation in order to get even higher accuracy. First of all, this layer (look-up table) is initialized with pre-trained word vectors. And then these vectors will be modified during training. That's why it has been used in the code. Though, there is no problem to prepare input data for the convolutional layer as a tensor consisting of matrices of word vectors. It will also do the job.

And I am sorry to have posted this question as an issue.

Feel free to ask questions. Perhaps they will be useful for someone else, or maybe they will inspire me to write comprehensive blog post. :smiley:

dhruvparamhans commented 8 years ago

Right, so during the embedding step, you use the word-vector representations of every word as the weights which are then modified during the training of the model. Am I right?

I had two questions

  1. Why do you have two kernel_size variables in the code? The first one is used in the _get_idx_fromsent function and the next time you use it during the training of the model. If I understand correctly, the second corresponds to the size of the matrix (kernel_size * 300). However I am at a loss for the first one. Is it used for padding when you first create the sentence index matrix?
  2. In the model that you created, how can we introduce a "depth slice" like it is explained here? Because if I understood the model correctly, we just have a two-dimensional matrix and we just use one kernel?

Thanks once again for your help.

vsl9 commented 8 years ago

Right, so during the embedding step, you use the word-vector representations of every word as the weights which are then modified during the training of the model. Am I right?

Yes, word vectors are parameters of Embedding layer.

Why do you have two kernel_size variables in the code? The first one is used in the get_idx_from_sent function and the next time you use it during the training of the model. If I understand correctly, the second corresponds to the size of the matrix (kernel_size * 300).

Yes, the second kernel_size is the first dimension of the convolutional kernel size. Since word vectors have dimension = 300, the convolutional kernel size is (kernel_size, 300).

However I am at a loss for the first one. Is it used for padding when you first create the sentence index matrix?

Exactly, it's a padding originally proposed by Yoon Kim, though my experiments showed that it is unnecessary. So feel free to set this value to 0.

In the model that you created, how can we introduce a "depth slice" like it is explained here?

Well, it's already there.

Because if I understood the model correctly, we just have a two-dimensional matrix and we just use one kernel?

All data at inputs/outputs of an arbirtrary layer is a 4D tensor (array): N x C x H x W. N is a number of samples (reviews) in a batch. C is a number of channels (feature maps). H is a height. W is a width.

So input data of the convolutional layer has shape = (N, 1, max_len, 300). The convolutional layer processes it with N_fm kernels of size (kernel_size, 300). So output data of the convolutional layer has shape = (N, N_fm, max_len-kernel_size+1, 1). That is, the depth is N_fm.