isl-org / Open3D-ML

An extension of Open3D to address 3D Machine Learning tasks
Other
1.84k stars 315 forks source link

"Segmentation FCan't pickle local object 'SemSegRandomSampler.get_point_sampler.<locals>._random_centered_gen'ault while training RandLANet on S3DIS") #565

Open rsazid99 opened 2 years ago

rsazid99 commented 2 years ago

Checklist

Describe the issue

When I am trying to run "python scripts/run_pipeline.py torch -c ml3d/configs/randlanet_semantickitti.yml --dataset.dataset_path ../dataset/SemanticKitti --pipeline SemanticSegmentation --dataset.use_cache True --num_workers 0" I am getting "ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'SemSegRandomSampler.get_point_sampler.._random_centered_gen' [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]"

Steps to reproduce the bug

$ python scripts/run_pipeline.py torch -c ml3d/configs/randlanet_semantickitti.yml --dataset.dataset_path ../dataset/SemanticKitti --pipeline SemanticSegmentation --dataset.use_cache True --num_workers 0

Error message

regular arguments backend: gloo batch_size: null cfg_dataset: null cfg_file: ml3d/configs/randlanet_semantickitti.yml cfg_model: null cfg_pipeline: null ckpt_path: null dataset: null dataset_path: null device: cuda device_ids:

extra arguments dataset.dataset_path: /home/sazid/Open3D-ML/scripts/dataset/SemanticKitti dataset.use_cache: 'True' num_workers: '0'

INFO - 2022-10-20 13:25:56,253 - semantic_segmentation - DEVICE : cuda INFO - 2022-10-20 13:25:56,253 - semantic_segmentation - Logging in file : ./logs/RandLANet_SemanticKITTI_torch/log_train_2022-10-20_13:25:56.txt INFO - 2022-10-20 13:25:56,286 - semantickitti - Found 19130 pointclouds for train INFO - 2022-10-20 13:25:57,425 - semantickitti - Found 4071 pointclouds for validation INFO - 2022-10-20 13:25:57,677 - semantic_segmentation - Initializing from scratch. INFO - 2022-10-20 13:25:57,678 - semantic_segmentation - Writing summary in train_log/00008_RandLANet_SemanticKITTI_torch. INFO - 2022-10-20 13:25:57,678 - semantic_segmentation - Started training INFO - 2022-10-20 13:25:57,679 - semantic_segmentation - === EPOCH 0/100 === training: 0%| | 0/4783 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/sazid/Open3D-ML/scripts/run_pipeline.py", line 245, in sys.exit(main()) File "/home/sazid/Open3D-ML/scripts/run_pipeline.py", line 179, in main pipeline.run_train() File "/home/sazid/miniconda3/envs/test/lib/python3.10/site-packages/open3d/_ml3d/torch/pipelines/semantic_segmentation.py", line 406, in run_train for step, inputs in enumerate(tqdm(train_loader, desc='training')): File "/home/sazid/miniconda3/envs/test/lib/python3.10/site-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/home/sazid/miniconda3/envs/test/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 444, in iter return self._get_iterator() File "/home/sazid/miniconda3/envs/test/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 390, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "/home/sazid/miniconda3/envs/test/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1077, in init w.start() File "/home/sazid/miniconda3/envs/test/lib/python3.10/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/home/sazid/miniconda3/envs/test/lib/python3.10/multiprocessing/context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/home/sazid/miniconda3/envs/test/lib/python3.10/multiprocessing/context.py", line 300, in _Popen return Popen(process_obj) File "/home/sazid/miniconda3/envs/test/lib/python3.10/multiprocessing/popen_forkserver.py", line 35, in init super().init(process_obj) File "/home/sazid/miniconda3/envs/test/lib/python3.10/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/home/sazid/miniconda3/envs/test/lib/python3.10/multiprocessing/popen_forkserver.py", line 47, in _launch reduction.dump(process_obj, buf) File "/home/sazid/miniconda3/envs/test/lib/python3.10/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'SemSegRandomSampler.get_point_sampler.._random_centered_gen' [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

Expected behavior

No response

Open3D, Python and System information

- Operating system: Pop OS 22.04
- Python version: Python 3.10
- Open3D version: 0.16.0
- System type: x86 
- Is this remote workstation?: no
- How did you install Open3D?: pip

Additional information

No response

whuhxb commented 1 year ago

Hi @rsazid99 I have met the same bug with you when I'm running semantic segmentation on Toronto3D with RandLA-Net. Have you ever solved this bug? Thanks a lot.

rsazid99 commented 1 year ago

@whuhxb I couldn't solve this bug yet.

shayan-nikoo commented 1 year ago

This can be because of multiprocessing problems. Try to add num_workers: 0 and pin_memory: false to the configs/randlanet_s3dis.yml file in the pipeline section. It solved it for me.