Accelerat the Training Process

JiYuanFeng commented 3 years ago

Hi, yang song, thanks for your nice work.

I tried to reproduce the experiment "configs/subvp/cifar10_ncsnpp_continuous.py", which runs on a single V100 with 128 images. However, I found the training is too slow, as of now, 100K iterations consumed around 23 hours.

I want to ask if an experiment with a larger batch size run on multiple GPU can produce the same performance? At your convenience, would you share with me the config of the multiple GPU experiment of cifar10?

Sincerely thanks for your help.

yang-song commented 3 years ago

Yes, the performance shouldn't depend on the number of GPUs used for training. With a batch size of 128, the training should finish in about 3 days on 4 V100s.

The original config was actually already designed for training on multiple GPUs.

JiYuanFeng commented 3 years ago

Thank you for the information!

yang-song / score_sde_pytorch

Accelerat the Training Process #4