DistributedDataParallel device_ids and output_device arguments only work with single-device CUDA modules

wenwenyu / PICK-pytorch

Code for the paper "PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks" (ICPR 2020)

https://arxiv.org/abs/2004.07464

MIT License

556 stars 193 forks source link

DistributedDataParallel device_ids and output_device arguments only work with single-device CUDA modules #21

Closed tengerye closed 4 years ago

tengerye commented 4 years ago

Hi, I encountered a problem with your code:

AssertionError: DistributedDataParallel device_ids and output_device arguments only work
with single-device CUDA modules, but got device_ids [0], output_device 0, and
module parameters {device(type='cuda', index=0), device(type='cpu')}.

But if I commented

if self.config['trainer']['sync_batch_norm']:
    self.model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(self.model)

of your trainer.py, the code runs without errors. I used the recommended settings. May I ask why?

wenwenyu commented 4 years ago

SyncBatchNorm is only working under the condition of one GPU per-process mode according to requirements of official Pytorch. In other words, the nproc_per_node and local_world_size args must be the same in train command args.

So the reasons for the error are that you turn on sync_batch_norm arg as true in config.json but the nproc_per_node and local_world_size args is different when you run commends. Please check again.

tengerye commented 4 years ago

Hi @wenwenyu , thanks a lot for your kind reply. I did set the number of nproc_per_node and local_world_size to be the same. I set both of them to 1.

wenwenyu commented 4 years ago

Try to set the number of CUDA_VISIBLE_DEVICES as same as nproc_per_node.

wenwenyu commented 4 years ago

I am going to close this issue. Please feel free to reopen or create a new one if you have more questions.