OOM Error when I am trying to run FM on dataset ml-tag for the best performance

Dellen commented 6 years ago

Here is my command: python FM.py --dataset ml-tag --epoch 100 --pretrain -1 --batch_size 4096 --hidden_factor 256 --lr 0.01 --keep 0.7

But I got a Out Of Memory Error as follows: OOM when allocating tensor with shape[1404801,256]

What's strange is that when I run AFM, the command works just fine. And the command with frappe dataset works fine as well. python AFM.py --dataset ml-tag --epoch 100 --pretrain 2 --batch_size 4096 --hidden_factor [256,256] --keep [1.0,0.5] --lamda_attention 100.0 --lr 0.1

I looked it up, and found that 1404801 is the number of training samples. I think it's because of this code. self.summed_features_emb = tf.reduce_sum(self.nonzero_embeddings, 1, keep_dims=True) # None * 1 * K

Still trying to figure out what went wrong. Hope someone might give me a clue?

Dellen commented 6 years ago

What's strange is, isn't that that number should be the batch_size? I don't understand why the total number of training sample would show up.

Dellen commented 6 years ago

Found it. It was because in the init phase, it would run the whole dataset through sess.run(). But now I wonder why the code for AFM works fine, because it would do the same in the init phase.

Dellen commented 6 years ago

Now I understand. In AFM, it doesn't need to do the square calculation. self.summed_features_emb = tf.reduce_sum(self.nonzero_embeddings, 1, keep_dims=True) # None * 1 * K

Now the problem is how can I run FM on my computer, 8GB GPU. What is the your environment setup for this?

Dellen commented 6 years ago

I solved this problem by changing the evaluate function into feeding the data in batches. Now it works fine.

hexiangnan / attentional_factorization_machine

OOM Error when I am trying to run FM on dataset ml-tag for the best performance #6