When re-producing the experiment of pretraining on mag240m and evaluating on arxiv, we found that the Contrastive baseline results in the similar performance as Prodigy when the aux loss is applied (using -attr 1000) and the training step is increased to 50,010.
The confusing results on the test accuracy over arxiv dataset are
way
Contrastive
Prodigy
3
74.92
73.09
5
63.81
61.52
10
49.77
46.74
20
37.62
34.41
40
27.85
25.13
The checkpoint of the contrastive model we obtained when pretraining on mag240m is attached. Could you please clarify if there is anything wrong in the setting of our experiments? Thank you!
When re-producing the experiment of pretraining on mag240m and evaluating on arxiv, we found that the Contrastive baseline results in the similar performance as Prodigy when the aux loss is applied (using
-attr 1000
) and the training step is increased to 50,010.The complete training command we used is
The evaluation command is
The confusing results on the test accuracy over arxiv dataset are
The checkpoint of the contrastive model we obtained when pretraining on mag240m is attached. Could you please clarify if there is anything wrong in the setting of our experiments? Thank you!
state_dict_49000.ckpt.zip