Closed qfwysw closed 2 years ago
+1
Add torch.multiprocessing.set_start_method('fork') in train.py, like this:
if __name__ == '__main__':
torch.multiprocessing.set_start_method('fork')
main()
It seems like a bug in the project DETR3D. Please follow its instruction of using mmdet3d and create issues for discussion there, which can guide you to the correct solution more directly.
I add torch.multiprocessing.set_start_method('fork') , but error occured:ValueError: cannot find context for 'fork',how to solve it?
I set workers_per_gpu=0,it works.
I add torch.multiprocessing.set_start_method('fork') , but error occured:ValueError: cannot find context for 'fork',how to solve it?
I set workers_per_gpu=0,it works.
my sys.platform is windows, so 'fork' is not supported.
I add torch.multiprocessing.set_start_method('fork') , but error occured:ValueError: cannot find context for 'fork',how to solve it?
I set workers_per_gpu=0,it works.
how to solve it,thx
Envriment fatal: not a git repository (or any parent up to mount point /opt/data) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). sys.platform: linux Python: 3.8.13 (default, Mar 28 2022, 11:38:47) [GCC 7.5.0] CUDA available: True GPU 0,1,2,3,4,5,6,7: NVIDIA GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.1.TC455_06.29190527_0 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.9.0+cu111 PyTorch compiling details: PyTorch built with:
TorchVision: 0.10.0+cu111 OpenCV: 4.5.5 MMCV: 1.4.8 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.22.0 MMSegmentation: 0.22.1 MMDetection3D: 1.0.0rc0+
command ./tools/dist_train.sh ./projects/configs/detr3d/detr3d_res101_gridmask.py 1
Error Traceback (most recent call last): File "./tools/train.py", line 248, in
main()
File "./tools/train.py", line 237, in main
train_model(
File "/opt/data/private/glchen/projects/detr3d/mmdection3d/mmdet3d/apis/train.py", line 64, in train_model
train_detector(
File "/opt/data/private/glchen/projects/detr3d/mmdection/mmdet/apis/train.py", line 208, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/opt/conda/envs/dr/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/opt/conda/envs/dr/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
for i, data_batch in enumerate(self.data_loader):
File "/opt/conda/envs/dr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 359, in iter
return self._get_iterator()
File "/opt/conda/envs/dr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 305, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/opt/conda/envs/dr/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 918, in init
w.start()
File "/opt/conda/envs/dr/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/opt/conda/envs/dr/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/opt/conda/envs/dr/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/opt/conda/envs/dr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/opt/conda/envs/dr/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/opt/conda/envs/dr/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/opt/conda/envs/dr/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'dict_keys' object
Results ./projects/configs/detr3d/detr3d_res101_gridmask.py --gpu-ids 7 I want to implement some ideas of my own based on mmdect3d. When I execute the above command, the program runs fine. But when I use the distributed training method, even if I only use one gpu, I will get such an error.