ylabbe / cosypose

Code for "CosyPose: Consistent multi-view multi-object 6D pose estimation", ECCV 2020.
MIT License
301 stars 89 forks source link

Multi GPU training #43

Open mauku opened 3 years ago

mauku commented 3 years ago

Hello,

I am currently training to run your training script for single view pose estimation model on multi GPU. I can run the script on 1 GPU and it trains as expected. However, when i try to run the script with the steps described in section "Note on GPU parallelization", i get the following error:

File "../cosypose/cosypose/utils/multiepoch_dataloader.py", line 44, in next idx, batch = self.dataloader_iter._get_data() AttributeError: '_DataLoaderIter' object has no attribute '_get_data'

Would be thankfull for any help. Thanks in advance !

hannes56a commented 3 years ago

Hi, same problem. It worked for me until I updated my GPU to RTX3080ti and have to update nvidia driver, cuda, cudnn, pytorch, ...

Does anyone succesfully updated all these stuff??

hannes56a commented 3 years ago

Hey, I found a solution for me. I setted the "N_WORKERS" to 0 for some debug case and forgot to set it back. So the iterator runs over "_SingleProcessDataLoaderIter" instead of "_MultiProcessDataLoaderIter" (torch.utils.data.dataloader), which has no "_get_data". So, after setting "N_WORKERS" back to standard ("min(N_CPUS - 2, 8)") this failure was gone.