Understanding the LSTM models

ShirinVafaei commented 2 years ago

I have read your paper and it is very interesting. I understood that you used last layer features of LSTM networks for combination with language vectors. However, would you please tell me how did you train you LSTM networks? Was it a classification tasks? I mean, what were the labels while training the LSTM networks? Thank you :)

victorywys commented 2 years ago

Thanks for your interest in our work! The usage of the main LSTM is generally the same as a normal LSTM and for different tasks, you can train in different ways. For example, in MOSI dataset, the labels are real numbers indicating the positive level of the sentiment. Therefore, we apply a fully connected layer to the state of LSTM and get a 1-d number, then we use MSE loss to do the back propagation. For other tasks like classification, as you mention here, of course you can project the state of the LSTM to a suitable dimension (the number of categories) with a fully connected layer and use a Cross Entropy loss to train the model.

ShirinVafaei commented 2 years ago

Thank you very much for your explanation!!

ShirinVafaei commented 2 years ago

So I have another question. You first trained the LSTM networks, extracted the features. Then separately using the the LSTM features, you trained the gated-modality mixing and shifting parts. Is that right? Or does the whole system (including Non-verbal subnetworks) were trained as a whole? Thank you :)

victorywys commented 2 years ago

The whole system is trained together since there are no available labels in the middle for us to train the parts separately.

victorywys / RAVEN

Understanding the LSTM models #8