Closed kuri-leo closed 2 years ago
Hi Leo, thanks for reporting your results. May I know how many GPUs did you use? This is important because the real batch size is per_device_train_batch_size x num_gpus x accumulation steps
. In my experiment, I used 8 A100 GPUs, which results in a batch size of 16.
Hi Leo, thanks for reporting your results. May I know how many GPUs did you use? This is important because the real batch size is
per_device_train_batch_size x num_gpus x accumulation steps
. In my experiment, I used 8 A100 GPUs, which results in a batch size of 16.
Hey Yizhong,
Thanks for your rapid reply :-)
In my test, I only used one A100 for debugging. So I will try 8 GPU and share the result later.
Thank you again and have a nice day!
Cheers, Leo
Hi Yizhong,
Thanks for your hints, and I have successfully performed 2 tests and received even better results of 48.5 (sample 8 times) and 55.1235 (sample 64 times) than 48.5 and 54.7 respectively reported from paper.
That's amazing!
Leo
Hi everyone,
It's strange that I can only get 48 on 8 3090 GPUs without changing any parameters, does anyone know the possible reason?
@Yufang-Liu
Hi Yufang,
Given the multitude of factors that could lead to variations in scores, it was on the batch_size
my previous test. Further, I suggest examining if any optimization measures such as half-precision
or ZERO
have been auto-implemented via the Deepspeed
or accelerate
package.
Hope this may help.
Leo
@Yufang-Liu
Hi Yufang,
Given the multitude of factors that could lead to variations in scores, it was on the
batch_size
my previous test. Further, I suggest examining if any optimization measures such ashalf-precision
orZERO
have been auto-implemented via theDeepspeed
oraccelerate
package.Hope this may help.
Leo
Hi Leo, thanks a lot for your helpful suggestions !!
I found the reason is the version of the installed packages. I got the same results on 8 3090 GPUs with the same version of packages. Still not sure which package affects the performance.
Hello Yizhong and everyone!
Thanks for your great work and contribution. While I'm attempting to replicate the performance in Fig. 5b in this paper with its setting, I find there is a gap and I was wondering if you could share some experience on this.
I attempted four times to run
scripts/train_tk_instruct.sh
with only changing--max_num_instances_per_task
or--seed
:--max_num_instances_per_task 8
, it reportstrain/predict_rougeL 45.866
, from Fig. 5b, it's 48.5--max_num_instances_per_task 8
and--seed 1337
, it reportstrain/predict_rougeL 46.762
--max_num_instances_per_task 64
, it reportstrain/predict_rougeL 49.6898
, from Fig. 5b, it's 54.7--max_num_instances_per_task 100
(default), it reportstrain/predict_rougeL 49.3467
I simply copied
data/splits/default/test_tasks.txt
intodata/splits/default/dev_tasks.txt
while maintaining the default settings for everything else. I'm not sure if the parameters inscripts/train_tk_instruct.sh
are the default settings in paper, and I'm hoping that you can kindly offer some suggestion.Thanks in advance!
Cheers, Leo