Open Marshall-yao opened 5 years ago
Just python train.py -opt options/train/train_EDVR_M.yml
, that is the command I am using for one GPU
@yaolugithub young666 has provided the command for training with one GPU. Thanks, @young666.
If you do not change the batch size in the config file, it will cost longer time for each iteration when using fewer GPUs. I think that is why the performance does not decrease when you use two GPUs compared with eight GPUs.
Thanks very much, @young666. Before you gave your answer, i used the command of python -m torch.distributed.launch --nproc_per_node=1 --master_port=4321 train.py -opt options/train/train_EDVR_M.yml --launcher pytorch and changed num_fworker to 0 ( the initial value is 3 ) to train the network.
@xinntao Yes, i did not change batch size when i trained the code on two Ti X gpus. It costs about 12 days for 600k iteration.
Besides, 1) when i trained on one GPU,which is proper command ,mine or young666's? 2) when i used the above command to trained the code , i changed batch size to 16 for CUDA out of memory and learning rate from 4.00e+4 to 2.00e+4 . Is the learning rate proper ?
Thanks .
@yaolugithub if you aren't sure which is proper, you could open the code and see with every command and find out how the model would be trained
@yaolugithub 1) The command provided by young666 is better, I think. 2) The original setting: batch size 32 for weight GPUs. Each GPU has 4 samples. If the memory allows, you can set 8 or more. I think you can keep the learning rate. But it's better to have a comparison to see which one is better.
Thanks,xintao. I will have a try about learning rate.
Xintao notifications@github.com 于2019年7月20日周六 下午10:45写道:
@yaolugithub https://github.com/yaolugithub
- The command provided by young666 is better, I think.
- The original setting: batch size 32 for weight GPUs. Each GPU has 4 samples. If the memory allows, you can set 8 or more. I think you can keep the learning rate. But it's better to have a comparison to see which one is better.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xinntao/EDVR/issues/67?email_source=notifications&email_token=AKYUIODB6HDAO6RAM4UHWNTQAMQHZA5CNFSM4IEWKBH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2NPXVY#issuecomment-513473495, or mute the thread https://github.com/notifications/unsubscribe-auth/AKYUIOAF4I2PWY3OZATZZBTQAMQHZANCNFSM4IEWKBHQ .
@young666
Thanks very much.
You have said that you used the command of python train.py -opt options/train/train_EDVR_M.yml to train .
Could you tell me if you also changed the n_workers from 3 to 0 ?
@young666 I have use your training command to train . It trains more quickly with n_worker =3 than 0.
@yaolugithub You should read the Pytorch document to understand what n_worker is, and how many is enough
@yaolugithub The n_workers means the workers for each GPU. Empirically, you do not need to change it when you use a different number of GPUs for training.
@young666 Thanks very much. The reason that i changed num_workers from 3 to 0 is the code running stuck. So, i googled the reason. Someone said to change num_worker to 0 can avoid this situation.
@xinntao Thanks very much for your reply.
Situation:
I trained the code with command of python train.py -opt options/train/train_EDVR_M.yml or python -m torch.distributed.launch --nproc_per_node=1 --master_port=4321 train.py -opt options/train/train_EDVR_M.yml --launcher and set the num_workers = 3(not change) on one GPU.
But these two training stuck at certain iterations. Then i refered some documents and set num_worker = 0. Thus , it solved this problem.
However, after changing ,the training speed is about 100s while that is 60s before this change.
Do you know how to improve the training speed ?
@yaolugithub I think the bottleneck is the IO speed if you set num_worker to 0. You'd better get the multi-process data loader work.
Thanks ,xintao.
Yes, the num_worker is the parameter of loading samples in a batch size. So, we'd better set it bigger than zero.
But the training process stuck. May be it can solved by modifying the code. But i did not how to modify.
Xintao notifications@github.com 于2019年8月5日周一 下午1:25写道:
@yaolugithub https://github.com/yaolugithub I think the bottleneck is the IO speed if you set num_worker to 0. You'd better get the multi-process data loader work.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/xinntao/EDVR/issues/67?email_source=notifications&email_token=AKYUIOF2TJAUD2GF5P6FWP3QC62V3A5CNFSM4IEWKBH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3QW6CA#issuecomment-518090504, or mute the thread https://github.com/notifications/unsubscribe-auth/AKYUIOAVOTKGVHTPAZJKAXLQC62V3ANCNFSM4IEWKBHQ .
Hi,xintao. Is the training command python -m torch.distributed.launch --nproc_per_node=1 --master_port=4321 train.py -opt options/train/train_EDVR_M.yml --launcher pytorch OK for training on one GPU using the pretrained model of train_EDVR_woTSA_M.yml
I have use this command training on two GPUs ( nproc_per_node= 2 others not change) and the performance is close to the results in the paper.
So , i have this question. Looking forward to your reply.
Best regards.