microsoft / MASS

MASS: Masked Sequence to Sequence Pre-training for Language Generation
https://arxiv.org/pdf/1905.02450.pdf
Other
1.11k stars 206 forks source link

supNMT pre-train problem with multi gpus #177

Open Andrewlesson opened 2 years ago

Andrewlesson commented 2 years ago

pre-train script from sup-nmt only run in single gpu. when i use multi gpus to pre-train supNMT, i get some problem below. Has anyone encountered the same situation?

Traceback (most recent call last): File "/search/odin/txguo/anaconda3/envs/mass/bin/fairseq-train", line 8, in sys.exit(cli_main()) File "/search/odin/txguo/anaconda3/envs/mass/lib/python3.6/site-packages/fairseq_cli/train.py", line 298, in cli_main nprocs=args.distributed_world_size, File "/search/odin/txguo/anaconda3/envs/mass/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 167, in spawn while not spawn_context.join(): File "/search/odin/txguo/anaconda3/envs/mass/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 103, in join (error_index, name) Exception: process 0 terminated with signal SIGKILL

jiaohuix commented 2 years ago

how to run with multi gpus?