Closed zhanghua7099 closed 3 years ago
The single GPU version will be fine. This error only occurs in muti-GPU version.
I wrote the code for Distributed training, but never tested it. Will fix it by today
@zhanghua7099 I have updated the repo. You can pull it and try. Please reply back if there is any other error. As I don't have a multi GPU system I can't test it.
The updated version runs well with multi-GPU.
Thank you for your excellent work!
Hi!
I have 4 2080TI GPUs and want to use them to train the superglue model. I try to run the following command:
python3 -m torch.distributed.launch --nproc_per_node=4 train_superglue.py --config_path configs/coco_config.yaml
But I get errors:
How to fix this problem?