ligang-cs / PseCo

An official implementation of the PseCo (ECCV2022)
Apache License 2.0
133 stars 19 forks source link

RuntimeError #15

Closed monsterlv-lhj closed 1 year ago

monsterlv-lhj commented 2 years ago

RuntimeError: Timed out initializing process group in store based barrier on rank: 0, for key: store_based_barrier_key:1 (world_size=4, worker_count=16, timeout=0:30:00) ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 820989) of binary: /home/lihejun/anaconda3/envs/semi-det/bin/python ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (FAILED). Waiting 300 seconds for other agents to finish /home/lihejun/anaconda3/envs/semi-det/lib/python3.6/site-packages/torch/distributed/elastic/utils/store.py:71: FutureWarning: This is an experimental API and will be changed in future.

ligang-cs commented 1 year ago

It maybe caused by the pytorch version. Pytorch 1.9.0 is prefered.