How to train and save your own model on multiple GPUs

ifzhang / FairMOT

[IJCV-2021] FairMOT: On the Fairness of Detection and Re-Identification in Multi-Object Tracking

MIT License

4.03k stars 934 forks source link

How to train and save your own model on multiple GPUs #182

Open JoJoliking opened 4 years ago

JoJoliking commented 4 years ago

Hallow： I have tried to use your function :model = DataParallel(model) or model = DataParallel(model,device_ids=0,output_device=0) But these two forms have errors. At the same time I cannot save the trained parametric model. Excuse me, how should I set up on 4 GPUs to save my own training parameters?

Thank you for your reading

tangsipeng commented 4 years ago

i just use DistributedDataParallel to train on single mechine and multiple gpus in last few days i suggest you to see pytoch Official website guide line i do it in a normal way that save as many model files as process i create and get about 40% boost you may also take care of syn-batchnormalization problem

JoJoliking commented 4 years ago

Thank you for your answer. Should I continue to use its parameters for training a new models？epochs=30；batch_size=12, num_works = 8

tangsipeng commented 4 years ago

using two gpus parallel , i found more memory uesd, so i reduce batch_size . i think that you can simple tread it as a multi process training but auto loss compute parallel

JoJoliking commented 4 years ago

Thank you very much.!

ShellyLingling commented 4 years ago

@tangsipeng

I want to change the dp to ddp and add the torch.cuda.amp, recently. could you give me some guidance? Could I have your Wechat ? my Wechat is wxid_m2ixgzhk34hf22

tangsipeng commented 4 years ago

@tangsipeng

I want to change the dp to ddp and add the torch.cuda.amp, recently. could you give me some guidance? Could I have your Wechat ? my Wechat is wxid_m2ixgzhk34hf22

sorry，I still not figure it out,
so, I can not give you tips