How to understand the loss function in the training process?

mquad / hgru4rec

Code for our ACM RecSys 2017 paper "Personalizing Session-based Recommendation with Hierarchical Recurrent Neural Networks"

MIT License

213 stars 71 forks source link

Hi, I understand, Theano code is not as easy to understand as Pytorch at a glance... :-) 1) The loss is computed only over the last hidden state of the session-GRU (h_s in the code). Optionally, the hidden state of the user-GRU (h_u) can be added to the mix, but it didn't help in our tests. The loss is averaged over all the outputs in a minibatch for each training step. So if your question was whether we compute the loss for each step in the sequence or only over the last step, we indeed compute the loss for each step in a sequence. 2) Session-based recommendation is framed as next-item prediction here, which in turn can be seen as single-class classification where, at each step in a sequence, the following item-ID in the sequence is the target class. The BPR loss (and the other losses) work with very large numbers of items thanks to the negative output sampling. For more details, take a look at this paper.

mquad / hgru4rec

How to understand the loss function in the training process? #2