finetune 时训练速度很慢，感觉瓶颈在处理数据上，有什么好的方法解决吗吗

wangbq18 commented 3 years ago

如题 [1,0]:06/29/2021 02:02:29 - INFO - main - Starting training... [1,0]:06/29/2021 02:02:29 - INFO - main - Running training with 2 GPUs [1,0]:06/29/2021 02:02:29 - INFO - main - Single-GPU Non-Accumulated batch size = 32 [1,0]:06/29/2021 02:02:29 - INFO - main - max_n_example_per_group = 1 [1,0]:06/29/2021 02:02:29 - INFO - main - Accumulate steps = 1 [1,0]:06/29/2021 02:02:29 - INFO - main - Total batch size = #GPUs Single-GPU batch size max_n_example_per_group * Accumulate steps [Image] = 64 [1,0]:06/29/2021 02:02:29 - INFO - main - Total #epochs = 15 [1,0]:06/29/2021 02:02:29 - INFO - main - Total #steps = 11910 [1,0]:06/29/2021 02:02:29 - INFO - main - Validate every 800 steps, in total 15 times 0%| | 44/11910 [05:29<24:21:55, 7.39s/it][1,0]:

jayleicn commented 3 years ago

Hi @wangbq18,

We noticed this issue as well, but did not have a good solution at the time. The main bottleneck is image or video preprocessing, DALI might help speed this up. Definitely let us know if you are able to make some progress here.

Best, Jie

wangbq18 commented 2 years ago

这是来自QQ邮箱的假期自动回复邮件。您好!我已收到您的邮件！

jayleicn / ClipBERT

finetune 时训练速度很慢，感觉瓶颈在处理数据上，有什么好的方法解决吗吗 #19