questions about the training process

mquad / hgru4rec

Code for our ACM RecSys 2017 paper "Personalizing Session-based Recommendation with Hierarchical Recurrent Neural Networks"

MIT License

215 stars 71 forks source link

questions about the training process #3

Closed lllmmmyyy closed 6 years ago

lllmmmyyy commented 6 years ago

hi， I'm not familiar with Theano, so I have some questions about the training process.

According to the code in line 919-936 in hgru4rec.py, it seems that the input length of data is set to 1 in each mini-batch, which means each mini-batch only consists of data from one time step. I am wondering, in this way, could the error back propagation through time?

mquad commented 6 years ago

Hi,

we are not using BPTT in training the network (sequences in recommender systems are rather short and BPTT didn't pay off the additional complexity). I suggest you take a look at this paper to get a better idea of the training process. It is essentially the same, HGRU just uses an additional layer to keep track of the user of each step of the sessions included in the minibatch.

lllmmmyyy commented 6 years ago

Hi, I've examine the training process of the paper "Session-based Recommendations with RNN", but I think there is a difference between these two papers. In the HGRU, there is an additional layer to keep track of each user among different sessions. Since your training process works with the additional layer, does it mean that most users in the data set only have a few number of sessions?

mquad commented 6 years ago

That's right, most users had few sessions (5/10). Nevertheless, in other experiments (not reported in the paper) we saw that the model behaves well also with users having more sessions (up to 20-30) despite the training doesn't use BPTT; interestingly, the gain of HGRU over GRU grows with the number of sessions in the user history in the scenarios we have tested (similar to video recommendation). Hope it helps

Massimo

lllmmmyyy commented 6 years ago

em, that's really interesting. Have you tried using BPTT in the HGRU when users have up to 20-30 sessions? If yes, dose the training without BPTT behave better than that with the BPTT?

mquad commented 6 years ago

No I did not, I cannot help you with that, sorry.