isl-org / Open3D-ML

An extension of Open3D to address 3D Machine Learning tasks
Other
1.88k stars 321 forks source link

AttributeError: Can't pickle local object 'SemSegRandomSampler.get_point_sampler.<locals>._random_centered_gen' #572

Open whuhxb opened 1 year ago

whuhxb commented 1 year ago

Checklist

Describe the issue

When I run RandLA-Net on Toronto3D dataset, I met this bug. I seems the multiprocesing error, have you ever met before?

AttributeError: Can't pickle local object 'SemSegRandomSampler.get_point_sampler.._random_centered_gen' [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

Steps to reproduce the bug

training:   0%|          | 0/50 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/export/home2/hanxiaobing/Documents/Open3D-ML-code/Open3D-ML/scripts/run_pipeline.py", line 246, in <module>
    sys.exit(main())
  File "/export/home2/hanxiaobing/Documents/Open3D-ML-code/Open3D-ML/scripts/run_pipeline.py", line 180, in main
    pipeline.run_train()
  File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/site-packages/open3d/_ml3d/torch/pipelines/semantic_segmentation.py", line 406, in run_train
    for step, inputs in enumerate(tqdm(train_loader, desc='training')):
  File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 438, in __iter__
    return self._get_iterator()
  File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 384, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1048, in __init__
    w.start()
  File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/context.py", line 291, in _Popen
    return Popen(process_obj)
  File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/popen_forkserver.py", line 47, in _launch
    reduction.dump(process_obj, buf)
  File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'SemSegRandomSampler.get_point_sampler.<locals>._random_centered_gen'
[W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors

Error message

training: 0%| | 0/50 [00:00<?, ?it/s] Traceback (most recent call last): File "/export/home2/hanxiaobing/Documents/Open3D-ML-code/Open3D-ML/scripts/run_pipeline.py", line 246, in sys.exit(main()) File "/export/home2/hanxiaobing/Documents/Open3D-ML-code/Open3D-ML/scripts/run_pipeline.py", line 180, in main pipeline.run_train() File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/site-packages/open3d/_ml3d/torch/pipelines/semantic_segmentation.py", line 406, in run_train for step, inputs in enumerate(tqdm(train_loader, desc='training')): File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/site-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 438, in iter return self._get_iterator() File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 384, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1048, in init w.start() File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/process.py", line 121, in start self._popen = self._Popen(self) File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/context.py", line 291, in _Popen return Popen(process_obj) File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/popen_forkserver.py", line 35, in init super().init(process_obj) File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/popen_fork.py", line 19, in init self._launch(process_obj) File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/popen_forkserver.py", line 47, in _launch reduction.dump(process_obj, buf) File "/export/home2/hanxiaobing/anaconda3/envs/Open3D-ML-Pytorch/lib/python3.10/multiprocessing/reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'SemSegRandomSampler.get_point_sampler.._random_centered_gen' [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors

Expected behavior

No response

Open3D, Python and System information

CUDA 11.6
Python 3.10
Ubuntu 20.04

Additional information

No response

whuhxb commented 1 year ago

When using tf version code, the code can run successfully.

shayan-nikoo commented 1 year ago

Try this: it can be because of multiprocessing problems. Try to add num_workers: 0 and pin_memory: false to the configs/randlanet_s3dis.yml file in the pipeline section. It solved it for me.