open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.36k stars 1.55k forks source link

TypeError: cannot pickle 'dict_keys' object #1364

Closed qfwysw closed 2 years ago

qfwysw commented 2 years ago

Envriment fatal: not a git repository (or any parent up to mount point /opt/data) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). sys.platform: linux Python: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0] CUDA available: True GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.1.TC455_06.29190527_0 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.9.0+cu111 PyTorch compiling details: PyTorch built with:

TorchVision: 0.10.0+cu111 OpenCV: 4.5.5 MMCV: 1.4.8 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.22.0 MMSegmentation: 0.22.1 MMDetection3D: 1.0.0rc0+

command ./tools/dist_train.sh ./projects/configs/detr3d/detr3d_res101_gridmask.py 1

Error Traceback (most recent call last): File "./tools/train.py", line 248, in main() File "./tools/train.py", line 237, in main train_model( File "/opt/data/private/glchen/projects/detr3d/mmdection3d/mmdet3d/apis/train.py", line 64, in train_model train_detector( File "/opt/data/private/glchen/projects/detr3d/mmdection/mmdet/apis/train.py", line 208, in train_detector runner.run(data_loaders, cfg.workflow) File "/opt/conda/envs/dr/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/opt/conda/envs/dr/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train for i, data_batch in enumerate(self.data_loader): File "/opt/conda/envs/dr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 359, in iter return self._get_iterator() File "/opt/conda/envs/dr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 305, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "/opt/conda/envs/dr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 918, in init w.start() File "/opt/conda/envs/dr/lib/python3.8/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/opt/conda/envs/dr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/opt/conda/envs/dr/lib/python3.8/multiprocessing/context.py", line 284, in _Popen return Popen(process_obj) File "/opt/conda/envs/dr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init super().init(process_obj) File "/opt/conda/envs/dr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/opt/conda/envs/dr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch reduction.dump(process_obj, fp) File "/opt/conda/envs/dr/lib/python3.8/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) TypeError: cannot pickle 'dict_keys' object

Results ./projects/configs/detr3d/detr3d_res101_gridmask.py --gpu-ids 7 I want to implement some ideas of my own based on mmdect3d. When I execute the above command, the program runs fine. But when I use the distributed training method, even if I only use one gpu, I will get such an error.

konyul commented 2 years ago

+1

konyul commented 2 years ago

Add torch.multiprocessing.set_start_method('fork') in train.py, like this:

if __name__ == '__main__':
    torch.multiprocessing.set_start_method('fork')
    main()
Tai-Wang commented 2 years ago

It seems like a bug in the project DETR3D. Please follow its instruction of using mmdet3d and create issues for discussion there, which can guide you to the correct solution more directly.

Darkzj commented 2 years ago

I add torch.multiprocessing.set_start_method('fork') , but error occured:ValueError: cannot find context for 'fork',how to solve it?

I set workers_per_gpu=0,it works.

Darkzj commented 2 years ago

I add torch.multiprocessing.set_start_method('fork') , but error occured:ValueError: cannot find context for 'fork',how to solve it?

I set workers_per_gpu=0,it works.

my sys.platform is windows, so 'fork' is not supported.

hzm-January commented 1 year ago

I add torch.multiprocessing.set_start_method('fork') , but error occured:ValueError: cannot find context for 'fork',how to solve it?

I set workers_per_gpu=0,it works.

how to solve it,thx