Closed Eilons closed 4 years ago
Hi, both models are trained with the same number of examples and have the same MRR@10. The main difference is the one trained for 100k iterations + batch size 128 was trained on a TPU v3, which is faster but only available on Google Cloud. The other model (400k + batch size 32) was trained on a TPU v2, which 2x slower but it is available for free on Colab.
Hi!
I seems that the trained model provided in https://drive.google.com/file/d/1crlASTMlsihALlkabAQP6JTYIZwC1Wm8/view Was trained using 100K training steps and not using 400K as said in the paper. Which model was used in your experiments and what was the number of training steps?
Thanks!