snap-stanford / prodigy

75 stars 7 forks source link

Baseline issue #4

Open uebian opened 3 weeks ago

uebian commented 3 weeks ago

When re-producing the experiment of pretraining on mag240m and evaluating on arxiv, we found that the Contrastive baseline results in the similar performance as Prodigy when the aux loss is applied (using -attr 1000) and the training step is increased to 50,010.

The complete training command we used is

python run_single_experiment.py --dataset mag240m --root /datasets --original_features True --input_dim 768 --emb_dim 256 -ds_cap 50010 -val_cap 100 -test_cap 100 --epochs 1 -ckpt_step 1000 -layers S2,U,A -lr 5e-4 -way 30 -shot 3 -qry 4 -eval_step 500 -task cls_nm_sb -bs 1 -aug ND0.5,NZ0.5 -aug_test True -attr 1000 --device 0 --prefix MAG_Contrastive

The evaluation command is

python run_single_experiment.py --dataset arxiv --root /datasets --emb_dim 256 --input_dim 768 -ds_cap 510 -val_cap 510 -test_cap 500 -eval_step 100 -epochs 1 --layers S2,U,A -way 3 -shot 3 -qry 3 -lr 1e-5 -bert roberta-base-nli-stsb-mean-tokens -pretrained state_dict_49000.ckpt --eval_only True --train_cap 10 --device 0

The confusing results on the test accuracy over arxiv dataset are

way Contrastive Prodigy
3 74.92 73.09
5 63.81 61.52
10 49.77 46.74
20 37.62 34.41
40 27.85 25.13

The checkpoint of the contrastive model we obtained when pretraining on mag240m is attached. Could you please clarify if there is anything wrong in the setting of our experiments? Thank you!

state_dict_49000.ckpt.zip

Dorbmon commented 3 weeks ago

It is quite wired. I encountered similar problem here.

Dorbmon commented 1 week ago

@q-hwang Hi, can you help clarify the result we got?