Problems using DDP - Githubissues

sconlyshootery / FeatDepth

This is the offical codes for the methods described in the "Feature-metric Loss for Self-supervised Learning of Depth and Egomotion".

MIT License

247 stars 28 forks source link

Problems using DDP #80

Closed WYZKLNH closed 2 years ago

WYZKLNH commented 2 years ago

I met tough problems when running this line in trainer.py/_dist_train model = MMDistributedDataParallel(model.cuda())

It worked after modifying like this: model = MMDistributedDataParallel(model.cuda(), device_ids=[local_rank], find_unused_parameters=True)

This is because the model must be set the specific gpu when running parallel, or it will be put on all gpus, and the batch_size will change, then error will occur when calculating backproject. why other people didn't meet this problem?

sconlyshootery commented 2 years ago

you can use CUDA_VISIBLE_DEVICES to control which gpu will be used

WYZKLNH commented 2 years ago

you can use CUDA_VISIBLE_DEVICES to control which gpu will be used

but I want to use all GPUS, one GPU for every process, maybe device_ids=[local_rank] is the only solution?