isl-org / Open3D-ML

An extension of Open3D to address 3D Machine Learning tasks
Other
1.85k stars 317 forks source link

Running `scripts/run_pipeline.py` torch and semantic kitti results in pickling error #640

Open nikste opened 8 months ago

nikste commented 8 months ago

Checklist

Describe the issue

trying to run the training pipeline with torch, RandLaNet and semantic kitti results in pickling error: python scripts/run_pipeline.py torch -c ml3d/configs/randlanet_semantickitti.yml --dataset_path /<path/to>/semantic_kitti/

Steps to reproduce the bug

`$python scripts/run_pipeline.py torch -c ml3d/configs/randlanet_semantickitti.yml --dataset_path /<path/to>/semantic_kitti/`

Error message

regular arguments
backend: gloo
batch_size: null
cfg_dataset: null
cfg_file: ml3d/configs/randlanet_semantickitti.yml
cfg_model: null
cfg_pipeline: null
ckpt_path: null
dataset: null
dataset_path: /media/nikste/SSD_030_06/semantic_kitti/
device: cuda
device_ids:
- '0'
framework: torch
host: localhost
main_log_dir: null
max_epochs: null
mode: null
model: null
node_rank: 0
nodes: 1
pipeline: SemanticSegmentation
port: '12355'
seed: 0
split: train

extra arguments
{}

open3d-ml/venv/lib/python3.10/site-packages/torch/cuda/__init__.py:107: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
INFO - 2024-02-17 19:11:23,311 - semantic_segmentation - DEVICE : cpu
INFO - 2024-02-17 19:11:23,311 - semantic_segmentation - Logging in file : ./logs/RandLANet_SemanticKITTI_torch/log_train_2024-02-17_19-11-23.txt
INFO - 2024-02-17 19:11:23,332 - semantickitti - Found 19130 pointclouds for train
INFO - 2024-02-17 19:11:24,468 - semantickitti - Found 4071 pointclouds for validation
INFO - 2024-02-17 19:11:24,723 - semantic_segmentation - Initializing from scratch.
INFO - 2024-02-17 19:11:24,724 - semantic_segmentation - Writing summary in train_log/00002_RandLANet_SemanticKITTI_torch.
INFO - 2024-02-17 19:11:24,724 - semantic_segmentation - Started training
INFO - 2024-02-17 19:11:24,724 - semantic_segmentation - === EPOCH 0/100 ===
training:   0%|                                                                                                                                            | 0/4783 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "open3d-ml/scripts/run_pipeline.py", line 261, in <module>
    sys.exit(main())
  File "open3d-ml/scripts/run_pipeline.py", line 192, in main
    pipeline.run_train()
  File "open3d-ml/venv/lib/python3.10/site-packages/open3d/_ml3d/torch/pipelines/semantic_segmentation.py", line 405, in run_train
    for step, inputs in enumerate(tqdm(train_loader, desc='training')):
  File "open3d-ml/venv/lib/python3.10/site-packages/tqdm/std.py", line 1182, in __iter__
    for obj in iterable:
  File "open3d-ml/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 441, in __iter__
    return self._get_iterator()
  File "open3d-ml/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 388, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "open3d-ml/venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1042, in __init__
    w.start()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/usr/lib/python3.10/multiprocessing/context.py", line 300, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.10/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.10/multiprocessing/popen_forkserver.py", line 47, in _launch
    reduction.dump(process_obj, buf)
  File "/usr/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'SemSegRandomSampler.get_point_sampler.<locals>._random_centered_gen'

Expected behavior

No response

Open3D, Python and System information

- Operating system: ubuntu

Additional information

No response

nikste commented 8 months ago

seems to be realted to this: https://github.com/isl-org/Open3D-ML/issues/478