Open JoJoliking opened 4 years ago
i just use DistributedDataParallel to train on single mechine and multiple gpus in last few days i suggest you to see pytoch Official website guide line i do it in a normal way that save as many model files as process i create and get about 40% boost you may also take care of syn-batchnormalization problem
Thank you for your answer. Should I continue to use its parameters for training a new models?epochs=30;batch_size=12, num_works = 8
using two gpus parallel , i found more memory uesd, so i reduce batch_size . i think that you can simple tread it as a multi process training but auto loss compute parallel
Thank you very much.!
@tangsipeng
I want to change the dp to ddp and add the torch.cuda.amp, recently. could you give me some guidance? Could I have your Wechat ? my Wechat is wxid_m2ixgzhk34hf22
@tangsipeng
I want to change the dp to ddp and add the torch.cuda.amp, recently. could you give me some guidance? Could I have your Wechat ? my Wechat is wxid_m2ixgzhk34hf22
sorry,I still not figure it out,
so, I can not give you tips
Hallow: I have tried to use your function :model = DataParallel(model) or model = DataParallel(model,device_ids=0,output_device=0) But these two forms have errors. At the same time I cannot save the trained parametric model. Excuse me, how should I set up on 4 GPUs to save my own training parameters?
Thank you for your reading