Reproduce the result of the RTE task

li-ziang commented 3 years ago

Hi, I am trying to reproduce the result of the RTE task. I followed the method in https://github.com/timoschick/pet/issues/4#issuecomment-707031187 and get the following output in my terminal.

2021-08-28 14:39:43,537 - INFO - modeling - --- RESULT (pattern_id=0, iteration=0) ---
2021-08-28 14:39:43,537 - INFO - modeling - {'acc': 0.7075812274368231}
2021-08-28 14:39:43,557 - INFO - modeling - === OVERALL RESULTS ===
2021-08-28 14:39:43,558 - INFO - modeling - acc-p0: 0.7075812274368231 +- 0
2021-08-28 14:39:43,558 - INFO - modeling - acc-all-p: 0.7075812274368231 +- 0

Since I used the --do_eval argument, the result should be that on dev set, but the result shown in Table1 in your paper is 69.8 (mine is 70.8). Did I use the correct training arguments? And here is my command for training

python cli.py \
--method pet \
--pattern_ids 0 1 2 3 \
--data_dir $PET/dataset/RTE \
--model_type albert \
--model_name_or_path $PETmodels/albert-xxlarge-v2 \
--task_name rte \
--output_dir ./rte-output-large-serious-2 \
--lm_training \
--pet_per_gpu_train_batch_size 2 \
--pet_gradient_accumulation_steps 8 \
--pet_max_steps 250 \
--sc_per_gpu_unlabeled_batch_size 2 \
--sc_gradient_accumulation_steps 8 \
--do_train \
--do_eval \
--sc_max_steps 5000

If I want to reproduce the results of the other 7 taks in Table1 in your paper, can I simply change the --task_name argument?

THANKS IN ADVANCE!

timoschick commented 3 years ago

Hi @li-ziang, you can find the exact commands that we've used for RTE and all other SuperGLUE tasks here: https://github.com/timoschick/pet/issues/19#issuecomment-747483960 One reason for your different results may be that we did not use auxiliary language modeling (i.e., we didn't set --lm_training) because with that, we ran into memory issues for some tasks. Let me know if you're unable to reproduce our results with the commands linked above.

li-ziang commented 3 years ago

Hi @li-ziang, you can find the exact commands that we've used for RTE and all other SuperGLUE tasks here: #19 (comment) One reason for your different results may be that we did not use auxiliary language modeling (i.e., we didn't set --lm_training) because with that, we ran into memory issues for some tasks. Let me know if you're unable to reproduce our results with the commands linked above.

Many thanks! @timoschick , I will have a try.

li-ziang commented 3 years ago

Hi @timoschick, I have tried the command for RTE in https://github.com/timoschick/pet/issues/19#issuecomment-747483960, and I got the following output:

2021-09-12 10:28:52,495 - INFO - modeling - --- RESULT (pattern_id=0, iteration=0) ---
2021-09-12 10:28:52,495 - INFO - modeling - {'acc': 0.7075812274368231}
2021-09-12 10:28:53,692 - INFO - modeling - === OVERALL RESULTS ===
2021-09-12 10:28:53,692 - INFO - modeling - acc-p0: 0.7075812274368231 +- 0
2021-09-12 10:28:53,692 - INFO - modeling - acc-all-p: 0.7075812274368231 +- 0

It seems that this is the correct result, thanks again!

timoschick commented 3 years ago

Great, thanks for letting me know :)

timoschick / pet

Reproduce the result of the RTE task #48