Different performance when using Master and v1.1.0

timoschick / pet

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"

https://arxiv.org/abs/2001.07676

Apache License 2.0

1.62k stars 285 forks source link

Different performance when using Master and v1.1.0 #67

Closed cylnlp closed 2 years ago

cylnlp commented 2 years ago

Hi @timoschick , I was playing with Roberta under zero-shot settings, using the commands provided by this page. I find that the performance varies very much. Using master code to experiment with MNLI, I obtained the performance:

acc-p0: 0.3593845526798451 +- 0
acc-p1: 0.3588750764214388 +- 0
acc-p2: 0.4338699816588547 +- 0
acc-p3: 0.42103117994701444 +- 0
acc-all-p: 0.39329019767678824 +- 0.0397922701833213

But when using v1.1.0 code, the performance is just:

acc-p0: 0.3273894436519258 +- 0
acc-p1: 0.32932545343387 +- 0
acc-p2: 0.3273894436519258 +- 0
acc-p3: 0.32789891991033215 +- 0
acc-all-p: 0.3280008151620134 +- 0.0009151683707158276

Do you have any ideas how come those results differ?

Thanks, Yulong

cylnlp commented 2 years ago

The commands are as below respectively.

Master:

python3 cli.py \
--method pet \
--pattern_ids 0 1 2 3 \
--data_dir ../mnli \
--model_type roberta \
--model_name_or_path roberta-large \
--task_name mnli \
--output_dir mnli-roberta-large \
--no_distillation \
--do_eval \
--pet_repetitions 1

and v1.1.0:

 python3 run_training.py \
--wrapper_type mlm \
--train_examples 100 \
--data_dir ../mnli \
--model_type roberta \ 
--model_name_or_path roberta-large \
--task_name mnli \
--output_dir mnli-roberta-large \
--do_train \
--do_eval \
--max_steps 0 \
--repetitions 1 \
--pattern_ids 0 1 2 3

timoschick commented 2 years ago

Hi @cylnlp, your results for the v1.1.0 code look really odd. Note that --max_steps only overrides --num_train_epochs if it is set to a value greater than 0, so you need to set --num_train_epochs 0 in the v1.1.0 example. However, training on 100 examples should actually improve performance, not make it worse. Could you still verify what happens if you set --num_train_epochs 0 in the v1.1.0 example?

cylnlp commented 2 years ago

Okay. Thanks @timoschick .