udibr / headlines

Automatically generate headlines to short articles
MIT License
525 stars 150 forks source link

Where is the model-dot-fit function? #15

Open JayPhate opened 7 years ago

JayPhate commented 7 years ago

I want to use Encoder-Decoder model for some other data. I am trying to understand this code. But I couldn't find the fit method in train.ipynb. After padding of description and heading, how to use these vector for training the model. What is the dimension for X and Y in model-dot-fit? The dimension of X may be #descriptions x 50 and the dimension of Y may be #headings x 50. And #descriptions equals to #headings.

Below is the command I used to fit the model. model_fit = model.fit(nxTrain, nyTrain, nb_epoch=1, batch_size=64, verbose=2) The dimensions of X and Y of model.fit method. xTrain.shape (17853, 50)

yTrain.shape (17853, 25)

But I got below error. Exception: Error when checking model target: expected activation_1 to have 3 dimensions, but got array with shape (17853, 25)

Please check the model summary. print(model.summary())


Layer (type) Output Shape Param # Connected to

embedding_1 (Embedding) (None, 50, 100) 4000000 embedding_input_1[0][0]


lstm_1 (LSTM) (None, 50, 512) 1255424 embedding_1[0][0]


dropout_1 (Dropout) (None, 50, 512) 0 lstm_1[0][0]


lstm_2 (LSTM) (None, 50, 512) 2099200 dropout_1[0][0]


dropout_2 (Dropout) (None, 50, 512) 0 lstm_2[0][0]


lstm_3 (LSTM) (None, 50, 512) 2099200 dropout_2[0][0]


dropout_3 (Dropout) (None, 50, 512) 0 lstm_3[0][0]


simplecontext_1 (SimpleContext) (None, 25, 944) 0 dropout_3[0][0]


timedistributed_1 (TimeDistribut (None, 25, 40000) 37800000 simplecontext_1[0][0]


activation_1 (Activation) (None, 25, 40000) 0 timedistributed_1[0][0]

Total params: 47253824


None

I used the same model as explained in train.ipynb. I am not getting what's wrong here?

udibr commented 7 years ago
  1. nyTrain should be renamed to yTrain because it is the matrix itself not its size (same for nxTrain)
  2. the notebook is using the inefficient loss categorical_crossentropy which needs the label of the words to be expanded to the size of the vocabulary. Look for usage of np_utils.to_categorical in the notebook if you want you can switch to sparse_categorical_crossentropy which does not require the huge memory needed to keep the yTrain in an expanded one-hot form. However I think that in that case you need to add an extra dimension of size 1 to yTrain. For example by doing np.expand_dims(yTrain,-1)

HTH, Udi

udibr commented 7 years ago

you are welcomed to do the changes and send a PR

JayPhate commented 7 years ago

@udibr I am very new to Keras and NN. Why do we need to use np_utils.to_categorical? Because I already converted all the vocab words to indices. So I am not using words for training and I am using only indices. I am trying to build Many to Many sequence labeling model. The fifth model from left in below image.

seq_labeling_modules

My issue is very similar to https://github.com/fchollet/keras/issues/2654 . From this link, I didn't understand the input and output shapes of the data.

Again you suggested to add an extra dimension of size 1 to yTrain, Can you elaborate why? Now the shape of yTrain is => (17853, 25), what will be the shape after adding an extra dimension?