mquad / hgru4rec

Code for our ACM RecSys 2017 paper "Personalizing Session-based Recommendation with Hierarchical Recurrent Neural Networks"
MIT License
213 stars 71 forks source link

How to understand the loss function in the training process? #2

Closed xuChenSJTU closed 6 years ago

xuChenSJTU commented 6 years ago

Hi, I am not goot at Theano, so I cannot understand this clearly. Recently I am using Pytorch to reproduce this paper. About the loss function, I have some questions: (1) Do you calculate the loss for each hidden state of GRU? Or only the last state? (2) Do you regard this session-based recommendation as one multi-classification problem in which the class number is the number of items and BPR loss is applied here? If this, when the item number are too large, does this way still work?
Thank you very much.

mquad commented 6 years ago

Hi, I understand, Theano code is not as easy to understand as Pytorch at a glance... :-) 1) The loss is computed only over the last hidden state of the session-GRU (h_s in the code). Optionally, the hidden state of the user-GRU (h_u) can be added to the mix, but it didn't help in our tests. The loss is averaged over all the outputs in a minibatch for each training step. So if your question was whether we compute the loss for each step in the sequence or only over the last step, we indeed compute the loss for each step in a sequence. 2) Session-based recommendation is framed as next-item prediction here, which in turn can be seen as single-class classification where, at each step in a sequence, the following item-ID in the sequence is the target class. The BPR loss (and the other losses) work with very large numbers of items thanks to the negative output sampling. For more details, take a look at this paper.