Closed rabeehk closed 2 years ago
Hi @rabeehk, did you use the exact command that you've posted here? I'm asking because there's a very important space missing. Your command is:
[...] --output_dir /idiap/temp/rkarimi/temp/experiments/boolq/roberta/supervised--do_train [...]
when it should be
[...] --output_dir /idiap/temp/rkarimi/temp/experiments/boolq/roberta/supervised --do_train [...]
With the former command, no training is performed at all (so you're basically getting zero-shot results) and the outputs are written to a directory called /idiap/temp/rkarimi/temp/experiments/boolq/roberta/supervised--do_train
. With the latter command, training is performed and outputs are written to /idiap/temp/rkarimi/temp/experiments/boolq/roberta/supervised
.
Let me know if this fixes your problem!
With the following command
!python cli.py \ --method pet \ --pattern_ids 0 1 2 3 4 \ --data_dir ./fewglue/FewGLUE/BoolQ \ --model_type albert \ --model_name_or_path albert-base-v2 \ --task_name boolq \ --output_dir /tmp/pet \ --do_train \ --do_eval \ --pet_per_gpu_eval_batch_size 8 \ --pet_per_gpu_train_batch_size 2 \ --pet_gradient_accumulation_steps 8 \ --pet_max_steps 250 \ --pet_max_seq_length 256 \ --sc_per_gpu_train_batch_size 2 \ --sc_per_gpu_unlabeled_batch_size 2 \ --sc_gradient_accumulation_steps 8 \ --sc_max_steps 5000 \ --sc_max_seq_length 256
and with the following data point distribution
!wc -l fewglue/FewGLUE/BoolQ/* 32 fewglue/FewGLUE/BoolQ/train.jsonl 9427 fewglue/FewGLUE/BoolQ/unlabeled.jsonl 3270 fewglue/FewGLUE/BoolQ/val.jsonl
I got the following results,
acc-p0: 0.5426095820591234 +- 0.0072968522602133755 acc-p1: 0.5753312945973497 +- 0.008348866556347945 acc-p2: 0.5363914373088685 +- 0.006885828898591881 acc-p3: 0.5649337410805301 +- 0.007555023022910462 acc-p4: 0.5482161060142712 +- 0.02863262975653322 acc-all-p: 0.5534964322120286 +- 0.019335782158363467
I think I'm doing something wrong somewhere. 'Cause I needed to get score something like ~79.0
Hi @savasy, if your aim is to reproduce our results, you're using the wrong (much smaller) language model: Our experiments are conducted with albert-xxlarge-v2
, whereas you are using albert-base-v2
(see also our paper, @rabeehk's command above or this thread).
Hi Timo Thank you, I realized I was using 64-samples dataset. At the end, I reimplemented the whole code base also, all fine now. thanks a lot. Best Rabeeh
Hi I am following [1], I am running the following command, I am not able to reproduce RTE results and really appreciate any suggestions to try. thanks.
I am getting the following results:
[1] https://github.com/timoschick/pet/issues/19#issuecomment-747483960