nyu-dl / dl4marco-bert

BSD 3-Clause "New" or "Revised" License
476 stars 87 forks source link

The trained model used in the submission #38

Closed Eilons closed 4 years ago

Eilons commented 4 years ago

Hi!

I seems that the trained model provided in https://drive.google.com/file/d/1crlASTMlsihALlkabAQP6JTYIZwC1Wm8/view Was trained using 100K training steps and not using 400K as said in the paper. Which model was used in your experiments and what was the number of training steps?

Thanks!

rodrigonogueira4 commented 4 years ago

Hi, both models are trained with the same number of examples and have the same MRR@10. The main difference is the one trained for 100k iterations + batch size 128 was trained on a TPU v3, which is faster but only available on Google Cloud. The other model (400k + batch size 32) was trained on a TPU v2, which 2x slower but it is available for free on Colab.