pre-train script from sup-nmt only run in single gpu. when i use multi gpus to pre-train supNMT, i get some problem below. Has anyone encountered the same situation?
Traceback (most recent call last):
File "/search/odin/txguo/anaconda3/envs/mass/bin/fairseq-train", line 8, in
sys.exit(cli_main())
File "/search/odin/txguo/anaconda3/envs/mass/lib/python3.6/site-packages/fairseq_cli/train.py", line 298, in cli_main
nprocs=args.distributed_world_size,
File "/search/odin/txguo/anaconda3/envs/mass/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 167, in spawn
while not spawn_context.join():
File "/search/odin/txguo/anaconda3/envs/mass/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 103, in join
(error_index, name)
Exception: process 0 terminated with signal SIGKILL
pre-train script from sup-nmt only run in single gpu. when i use multi gpus to pre-train supNMT, i get some problem below. Has anyone encountered the same situation?
Traceback (most recent call last): File "/search/odin/txguo/anaconda3/envs/mass/bin/fairseq-train", line 8, in
sys.exit(cli_main())
File "/search/odin/txguo/anaconda3/envs/mass/lib/python3.6/site-packages/fairseq_cli/train.py", line 298, in cli_main
nprocs=args.distributed_world_size,
File "/search/odin/txguo/anaconda3/envs/mass/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 167, in spawn
while not spawn_context.join():
File "/search/odin/txguo/anaconda3/envs/mass/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 103, in join
(error_index, name)
Exception: process 0 terminated with signal SIGKILL