Closed Dellen closed 6 years ago
What's strange is, isn't that that number should be the batch_size? I don't understand why the total number of training sample would show up.
Found it. It was because in the init phase, it would run the whole dataset through sess.run(). But now I wonder why the code for AFM works fine, because it would do the same in the init phase.
Now I understand. In AFM, it doesn't need to do the square calculation.
self.summed_features_emb = tf.reduce_sum(self.nonzero_embeddings, 1, keep_dims=True) # None * 1 * K
Now the problem is how can I run FM on my computer, 8GB GPU. What is the your environment setup for this?
I solved this problem by changing the evaluate function into feeding the data in batches. Now it works fine.
Here is my command:
python FM.py --dataset ml-tag --epoch 100 --pretrain -1 --batch_size 4096 --hidden_factor 256 --lr 0.01 --keep 0.7
But I got a Out Of Memory Error as follows: OOM when allocating tensor with shape[1404801,256]
What's strange is that when I run AFM, the command works just fine. And the command with frappe dataset works fine as well.
python AFM.py --dataset ml-tag --epoch 100 --pretrain 2 --batch_size 4096 --hidden_factor [256,256] --keep [1.0,0.5] --lamda_attention 100.0 --lr 0.1
I looked it up, and found that 1404801 is the number of training samples. I think it's because of this code.
self.summed_features_emb = tf.reduce_sum(self.nonzero_embeddings, 1, keep_dims=True) # None * 1 * K
Still trying to figure out what went wrong. Hope someone might give me a clue?