rgcottrell / pytorch-human-performance-gec

A PyTorch implementation of "Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study"
Apache License 2.0
50 stars 19 forks source link

dist_c10d is not defined training error - distributed_utils #9

Open NikhilCherian opened 4 years ago

NikhilCherian commented 4 years ago

@rgcottrell @tianfeichen @cqlijingwei Hey again. Thanks for all the earlier replies. I could preprocess, train and test everything in Google Colab. But recently, I switched to training it on my Gaming Laptop and i got the error.

dist_c10d is not defined. image Can you explain me more about has_c10d etc? Because in Google Colab, these were the parameters. ddp_backend='c10d' distributed_backend='nccl', distributed_init_method=None, distributed_port=-1, distributed_rank=0, distributed_world_size=1 But in the distributed_utils.py, i cannot import torch.distributed as dist_c10d. It always do to the torch.distributed as dist.no_c10d. Can you guide me here? image and when i would use init_fn = dist_no_c10d.init_process_group. It would start import the data and all. image

Any help would be appreciated. Thanks in advance.