rshaojimmy / MultiModal-DeepFake

[TPAMI 2024 & CVPR 2023] PyTorch code for DGM4: Detecting and Grounding Multi-Modal Media Manipulation and beyond
Other
369 stars 28 forks source link

训练过程出问题,sh train.sh #39

Open songzhuoyuan opened 4 months ago

songzhuoyuan commented 4 months ago

Traceback (most recent call last): File "train.py", line 557, in mp.spawn(main_worker, nprocs=ngpus_per_node, args=(args, config)) File "/root/miniconda3/envs/DGM4/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/root/miniconda3/envs/DGM4/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/root/miniconda3/envs/DGM4/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/root/miniconda3/envs/DGM4/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/root/autodl-tmp/code/MultiModal-DeepFake-main/train.py", line 316, in main_worker init_dist(args) File "/root/autodl-tmp/code/MultiModal-DeepFake-main/tools/env.py", line 13, in init_dist _init_dist_pytorch(args) File "/root/autodl-tmp/code/MultiModal-DeepFake-main/tools/env.py", line 27, in _init_dist_pytorch dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url, File "/root/miniconda3/envs/DGM4/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 576, in init_process_group store, rank, world_size = next(rendezvous_iterator) File "/root/miniconda3/envs/DGM4/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 183, in _tcp_rendezvous_handler store = _create_c10d_store(result.hostname, result.port, rank, world_size, timeout) File "/root/miniconda3/envs/DGM4/lib/python3.8/site-packages/torch/distributed/rendezvous.py", line 157, in _create_c10d_store return TCPStore( RuntimeError: Stop_waiting response is expected

Yizhichaoai commented 2 months ago

我也是这个问题,请问你解决了吗

songzhuoyuan commented 2 months ago

好像没有  

Zero.旋律.\ove @.***

 

------------------ 原始邮件 ------------------ 发件人: @.>; 发送时间: 2024年9月19日(星期四) 晚上7:12 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [rshaojimmy/MultiModal-DeepFake] 训练过程出问题,sh train.sh (Issue #39)

我也是这个问题,请问你解决了吗

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>