timoschick / pet

This repository contains the code for "Exploiting Cloze Questions for Few-Shot Text Classification and Natural Language Inference"
https://arxiv.org/abs/2001.07676
Apache License 2.0
1.62k stars 283 forks source link

Cannot reproduce result on MNLI #30

Closed MatthewCYM closed 3 years ago

MatthewCYM commented 3 years ago

Hi,

I try to reproduce your result of roberta-base on MNLI. Your result listed in paper is 55.1, while I only get 33.7.

I run the code with python3 cli.py \ --method pet \ --pattern_ids 0 1 2 3 \ --data_dir $DATA_DIR \ --model_type roberta \ --model_name_or_path roberta-base \ --task_name mnli \ --output_dir result/mnli \ --lm_training \ --alpha 1e-4 \ --pet_per_gpu_train_batch_size 1 \ --pet_per_gpu_unlabeled_batch_size 3 \ --pet_max_seq_length 256 \ --pet_gradient_accumulation_steps 4 \ --pet_max_steps 1000 \ --learning_rate 1e-5 \ --sc_max_seq_length 256 \ --sc_per_gpu_train_batch_size 4 \ --sc_gradient_accumulation_steps 4 \ --sc_num_train_epochs 3 \ --train_examples 50 \ --unlabeled_examples 30000 \ --split_examples_evenly \ --do_train \ --do_eval which strictly follow the setting mentioned in your paper. Could you please tell me how to fix it?

Regards, Matthew

timoschick commented 3 years ago

Hi @MatthewCYM,

I think there are three differences regarding the settings:

1) alpha in our implementation is actually 1 - alpha in the paper (this is something I should definitely fix but didn't have the time to do yet). So if you want alpha = 1e-4 as in the paper, you actually need to set alpha = 1 - 1e-4 = 0.9999 (the default value). I would assume that this is the main reason for performance differences.

2) We use less unlabeled examples (check out Section B.2 of the paper)

3) We train the final model not for 3 epochs, but for 5000 steps (see Table 5 in the paper). That is, you should set sc_max_steps 5000 instead of sc_num_train_epochs 3.

Additionally, as roberta-base requires much less memory, we actually didn't use gradient accumulation and instead directly set --pet_per_gpu_train_batch_size 4 --pet_per_gpu_unlabeled_batch_size 12, but this shouldn't have any impact on the results.

If fixing those three things still doesn't give you results similar to those reported in the paper, please let me know. Finally, if you want to reproduce the exact results from the paper, you may need to use v1.1.0 (--branch v1.1.0). Some things like random seed initialization and dataset shuffling are implemented a little bit differently in the current version.

MatthewCYM commented 3 years ago

Hi,

Thank you for the answering! Another issue I encountered is that when I train ipet on mnli with a single pattern, it will give me the following error:

Traceback (most recent call last): File "cli.py", line 282, in main() File "cli.py", line 266, in main pet.train_ipet(pet_model_cfg, pet_train_cfg, pet_eval_cfg, ipet_cfg, sc_model_cfg, sc_train_cfg, sc_eval_cfg, File "/home/jiadong/yiming/pet/pet/modeling.py", line 191, in train_ipet generate_ipet_train_sets(train_data=train_data, unlabeled_data=unlabeled_data, File "/home/jiadong/yiming/pet/pet/modeling.py", line 679, in generate_ipet_train_sets subdir_train_set = generate_ipet_train_set( File "/home/jiadong/yiming/pet/pet/modeling.py", line 723, in generate_ipet_train_set logits = np.average(logits, axis=0, weights=weights) File "<__array_function__ internals>", line 5, in average File "/home/yiming/anaconda3/envs/pet/lib/python3.8/site-packages/numpy/lib/function_base.py", line 409, in average raise ZeroDivisionError( ZeroDivisionError: Weights sum to zero, can't be normalized

Do we have to use multiple patterns for the ipet?

Regards, Matthew

timoschick commented 3 years ago

Yes, iPET requires at least two different patterns.