How do I provide 4096-dimensional feature vector to LSTM to get a sentence?

ghost commented 8 years ago

I am trying to generate a brief description of input images in English. For that purpose, I am using CNN and LSTM. So far, I am done with CNN module, in which I get a 4096-dimensional vector as the output of fc7 of my caffemodel (VGG net of 16 layers). Also, if I add the SoftMax layer on top of CNN, I am able to get class labels as follows:

e.g. an image of a person with mobile in his hand sitting on a bed. So far, I get class labels like 'person', 'mobile', 'bed'.

Now, I wish to generate a sentence from these words or by using the 4096-feature vector I get as CNN output.

nakosung commented 8 years ago

You can roll out your minibatch into a series of train. Summing deltas during substeps and then backprop accumulated gradient.

ghost commented 8 years ago

@nakosung Would you please give me the implementation details of this? Sorry, but I am very new to LSTM.

nakosung commented 8 years ago

You can do substepping with solver.iter_size gt 1. Unfortunately, I don't have a public example for large LSTM.

junhyukoh / caffe-lstm

How do I provide 4096-dimensional feature vector to LSTM to get a sentence? #9