Dataset fed into memory and Memory increase after the first training epoch

I found that the memory occupation on GPU is much larger than the theoretical embedding and model size. It seems the total sequence data got from Interactions.to_sequence method was somehow fed into memory, which is not reasonable. Anyone else notice this problem?

For example, we run the demo example movielens_sequence.py with the largest dataset '20M' to see the problem clearer. We can set parameters like: embedding_dim = 32, batch_size = 16, max_sequence_length = 200, item_num = 26,745 Actual memory usage on single GPU: Epoch 1: 789 MB Epoch 2: 1021 MB Epoch x later: 1021 MB

So theoretically, memory usage for embedding should be: item_num embedding_dim 4B = 26,745 32 4B = 3,423,360B = 3.4MB memory usage for model layers (LSTM/GRU) is hard to calculate precisely, but it should be not that large. After all, for one batch training data, it is only: batch_size embedding_dim max_sequence_length 4B = 16 32 200 4B = 409,600B = 409.6KB However, the size of the training sequence data is: item_num embedding_dim max_sequence_length 4B = 26,745 32 200 4B = 684,672,000B = 684MB

Let's see the problem with another set of parameters: embedding_dim = 256, batch_size = 256, max_sequence_length = 200, item_num = 26,745 Actual memory usage on single GPU: Epoch 1: 869 MB Epoch 2: 1101 MB Epoch x later: 1101 MB

We can see that, embedding_dim increases for 8 times, and batch_size increases for 16 times, the total memory usage only increases 80MB. So we can see that it probably the dataset was somehow fed into GPU memory.

The second thing is the memory will increase a lot after the first training epoch as shown above.

With these two problem, my own data cannot be trained on GPU with error 'OUT OF MEMORY'. Do you have any idea what's the problem and how to fix this?

Thanks a lot.

maciejkula / spotlight

Dataset fed into memory and Memory increase after the first training epoch #121