yizhongw / Tk-Instruct

Tk-Instruct is a Transformer model that is tuned to solve many NLP tasks by following instructions.
https://arxiv.org/abs/2204.07705
MIT License
177 stars 27 forks source link

[Question] parameters for performance reproduction in paper #10

Closed kuri-leo closed 2 years ago

kuri-leo commented 2 years ago

Hello Yizhong and everyone!

Thanks for your great work and contribution. While I'm attempting to replicate the performance in Fig. 5b in this paper with its setting, I find there is a gap and I was wondering if you could share some experience on this.

I attempted four times to run scripts/train_tk_instruct.sh with only changing --max_num_instances_per_task or --seed

I simply copied data/splits/default/test_tasks.txt into data/splits/default/dev_tasks.txt while maintaining the default settings for everything else. I'm not sure if the parameters in scripts/train_tk_instruct.sh are the default settings in paper, and I'm hoping that you can kindly offer some suggestion.

Thanks in advance!

Cheers, Leo

yizhongw commented 2 years ago

Hi Leo, thanks for reporting your results. May I know how many GPUs did you use? This is important because the real batch size is per_device_train_batch_size x num_gpus x accumulation steps. In my experiment, I used 8 A100 GPUs, which results in a batch size of 16.

kuri-leo commented 2 years ago

Hi Leo, thanks for reporting your results. May I know how many GPUs did you use? This is important because the real batch size is per_device_train_batch_size x num_gpus x accumulation steps. In my experiment, I used 8 A100 GPUs, which results in a batch size of 16.

Hey Yizhong,

Thanks for your rapid reply :-)

In my test, I only used one A100 for debugging. So I will try 8 GPU and share the result later.

Thank you again and have a nice day!

Cheers, Leo

kuri-leo commented 2 years ago

Hi Yizhong,

Thanks for your hints, and I have successfully performed 2 tests and received even better results of 48.5 (sample 8 times) and 55.1235 (sample 64 times) than 48.5 and 54.7 respectively reported from paper.

That's amazing!

Leo

Yufang-Liu commented 1 year ago

Hi everyone,

It's strange that I can only get 48 on 8 3090 GPUs without changing any parameters, does anyone know the possible reason?

kuri-leo commented 1 year ago

@Yufang-Liu

Hi Yufang,

Given the multitude of factors that could lead to variations in scores, it was on the batch_size my previous test. Further, I suggest examining if any optimization measures such as half-precision or ZERO have been auto-implemented via the Deepspeed or accelerate package.

Hope this may help.

Leo

Yufang-Liu commented 1 year ago

@Yufang-Liu

Hi Yufang,

Given the multitude of factors that could lead to variations in scores, it was on the batch_size my previous test. Further, I suggest examining if any optimization measures such as half-precision or ZERO have been auto-implemented via the Deepspeed or accelerate package.

Hope this may help.

Leo

Hi Leo, thanks a lot for your helpful suggestions !!

I found the reason is the version of the installed packages. I got the same results on 8 3090 GPUs with the same version of packages. Still not sure which package affects the performance.