ylabbe / cosypose

Code for "CosyPose: Consistent multi-view multi-object 6D pose estimation", ECCV 2020.
MIT License
301 stars 89 forks source link

Multi-gpu on a single node #61

Open Arrebol2020 opened 3 years ago

Arrebol2020 commented 3 years ago

Hello, I can success run 'Single gpu on a single node', but when I try to use ‘Multi-gpu on a single node’, I get the following error:

Traceback (most recent call last): File "/home/sevati/anaconda3/envs/cosypose/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/sevati/anaconda3/envs/cosypose/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/sevati/anaconda3/envs/cosypose/lib/python3.7/site-packages/cosypose-1.0.0-py3.7-linux-x86_64.egg/cosypose/scripts/run_cosypose_eval.py", line 16, in from cosypose.config import EXP_DIR, MEMORY, RESULTS_DIR, LOCAL_DATA_DIR File "/home/sevati/anaconda3/envs/cosypose/lib/python3.7/site-packages/cosypose-1.0.0-py3.7-linux-x86_64.egg/cosypose/config.py", line 33, in assert LOCAL_DATA_DIR.exists() AssertionError Setting OMP and MKL num threads to 1.

Why the LOCAL_DATA_DIR is in the python3.7/site-packages/cosypose-1.0.0-py3.7-linux-x86_64.egg/cosypose/config.py not in the projects/cosypose/cosypose

Arrebol2020 commented 3 years ago

And now it change to:

RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:784, unhandled cuda error, NCCL version 2.7.8 Setting OMP and MKL num threads to 1.

anxiaomi commented 2 years ago

@Arrebol2020 Hello,have you solved it?

Arrebol2020 commented 2 years ago

@Arrebol2020 Hello,have you solved it?

I didn't sovle it, so I try to implement DDP by myself, it seems to work.

kochsebastian commented 2 years ago

@Arrebol2020 might you share your work?