isl-org / Open3D-ML

An extension of Open3D to address 3D Machine Learning tasks
Other
1.84k stars 314 forks source link

custom dataset issue #273

Closed samux87 closed 3 years ago

samux87 commented 3 years ago

Hi Guys,

I'm trying to create a new dataset class called Van but I'm having some issues when I call this command:

import open3d.ml.torch as ml3d
dataset = ml3d.datasets.Van()

and the error is:

AttributeError: module 'open3d.ml.torch.datasets' has no attribute 'Van'

I also tried:

import sys
sys.path.append('/media/sam/grosso/OneDrive/ML/pointNet/Open3D-ML/ml3d/datasets')
import van

but I got: ImportError: attempted relative import with no known parent package

Any idea how to solve this?

Thank you, Sam.

sanskar107 commented 3 years ago

@samux87 you have to also add an entry in ml3d/datasets/__init__.py. Also make sure you run source set_open3d_ml_root.sh to use cloned Open3D-ML instead of the one from the installed wheel.

samux87 commented 3 years ago

it worked!

Now I have a new issue:

bash randlanet_van.sh torch /media/sam/grosso/Datasets/roadmarking/van_open3d
Using external Open3D-ML in /media/sam/grosso/OneDrive/ML/pointNet/Open3D-ML
regular arguments
cfg_dataset: null
cfg_file: ml3d/configs/van.yml
cfg_model: null
cfg_pipeline: null
ckpt_path: null
dataset: null
dataset_path: /media/sam/grosso/Datasets/roadmarking/van_open3d
device: gpu
framework: torch
main_log_dir: null
model: null
pipeline: SemanticSegmentation
split: train

extra arguments
{}

INFO - 2021-05-13 16:24:58,508 - semantic_segmentation - DEVICE : cuda
INFO - 2021-05-13 16:24:58,508 - semantic_segmentation - Logging in file : ./logs/RandLANet_Van_torch/log_train_2021-05-13_16:24:58.txt
INFO - 2021-05-13 16:24:58,509 - van - Found 1 pointclouds for train
preprocess:   0%|                                                                   | 0/1 [00:00<?, ?it/s]POINTS SHAPE:  (54853907, 3)
labels SHAPE:  (54853907,)
preprocess: 100%|███████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.26it/s]
INFO - 2021-05-13 16:24:58,954 - van - Found 1 pointclouds for validation
INFO - 2021-05-13 16:24:58,955 - semantic_segmentation - Initializing from scratch.
2021-05-13 16:24:59.038683: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
INFO - 2021-05-13 16:24:59,645 - semantic_segmentation - Writing summary in train_log/00032_RandLANet_Van_torch.
INFO - 2021-05-13 16:24:59,646 - semantic_segmentation - Started training
INFO - 2021-05-13 16:24:59,646 - semantic_segmentation - === EPOCH 0/100 ===
training:   0%|                                                                    | 0/50 [00:00<?, ?it/s]/media/sam/grosso/OneDrive/ML/pointNet/Open3D-ML/ml3d/torch/models/randlanet.py:450: UserWarning: Mixed memory format inputs detected while calling the operator. The operator will output channels_last tensor even if some of the inputs are not in channels_last format. (Triggered internally at  /pytorch/aten/src/ATen/native/TensorIterator.cpp:924.)
  result = m_leakyrelu(f_pc + shortcut)
training: 100%|███████████████████████████████████████████████████████████| 50/50 [00:17<00:00,  2.84it/s]
/home/sam/anaconda3/envs/semanticKitti/lib/python3.8/site-packages/torch/optim/lr_scheduler.py:118: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
validation: 100%|█████████████████████████████████████████████████████████| 10/10 [00:01<00:00,  5.35it/s]
valid acc:  []
valid_losses:  []
valid_conf_m:  []
SELF.valid_conf_m:  []
valid_conf_m:  0.0
Traceback (most recent call last):
  File "scripts/run_pipeline.py", line 134, in <module>
    main()
  File "scripts/run_pipeline.py", line 130, in main
    pipeline.run_train()
  File "/media/sam/grosso/OneDrive/ML/pointNet/Open3D-ML/ml3d/torch/pipelines/semantic_segmentation.py", line 426, in run_train
    self.save_logs(writer, epoch)
  File "/media/sam/grosso/OneDrive/ML/pointNet/Open3D-ML/ml3d/torch/pipelines/semantic_segmentation.py", line 466, in save_logs
    valid_total_acc = np.sum(np.diag(valid_conf_m)) / np.sum(valid_conf_m)
  File "<__array_function__ internals>", line 5, in diag
  File "/home/sam/anaconda3/envs/semanticKitti/lib/python3.8/site-packages/numpy/lib/twodim_base.py", line 285, in diag
    raise ValueError("Input must be 1- or 2-d.")
ValueError: Input must be 1- or 2-d.

It seems like that the loss wasn't calculated. Idk.

and with TF I got:

bash randlanet_van.sh tf /media/sam/grosso/Datasets/roadmarking/van_open3d
Using external Open3D-ML in /media/sam/grosso/OneDrive/ML/pointNet/Open3D-ML
regular arguments
cfg_dataset: null
cfg_file: ml3d/configs/van.yml
cfg_model: null
cfg_pipeline: null
ckpt_path: null
dataset: null
dataset_path: /media/sam/grosso/Datasets/roadmarking/van_open3d
device: gpu
framework: tf
main_log_dir: null
model: null
pipeline: SemanticSegmentation
split: train

extra arguments
{}

2021-05-13 16:30:07.807158: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-05-13 16:30:08.403776: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-05-13 16:30:08.581671: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-05-13 16:30:08.581755: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-13 16:30:08.585930: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.7335GHz coreCount: 20 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 298.32GiB/s
2021-05-13 16:30:08.585953: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-05-13 16:30:08.585976: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-05-13 16:30:08.586777: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-05-13 16:30:08.586971: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-05-13 16:30:08.586997: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-05-13 16:30:08.587835: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-05-13 16:30:08.590235: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-05-13 16:30:08.590286: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-13 16:30:08.597775: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-13 16:30:08.602076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-05-13 16:30:08.612967: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-05-13 16:30:08.617208: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 3600000000 Hz
2021-05-13 16:30:08.617437: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560f3da7bdc0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-05-13 16:30:08.617452: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-05-13 16:30:08.681237: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-13 16:30:08.681604: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560f3c3a22e0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-05-13 16:30:08.681619: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1080, Compute Capability 6.1
2021-05-13 16:30:08.681728: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-13 16:30:08.682016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.7335GHz coreCount: 20 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 298.32GiB/s
2021-05-13 16:30:08.682054: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2021-05-13 16:30:08.682063: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-05-13 16:30:08.682086: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2021-05-13 16:30:08.682100: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2021-05-13 16:30:08.682109: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2021-05-13 16:30:08.682121: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2021-05-13 16:30:08.682135: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-05-13 16:30:08.682162: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-13 16:30:08.682446: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-13 16:30:08.682712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-05-13 16:30:08.972705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-13 16:30:08.972731: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2021-05-13 16:30:08.972736: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2021-05-13 16:30:08.972850: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-13 16:30:08.973164: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-05-13 16:30:08.973452: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6661 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO - 2021-05-13 16:30:09,055 - semantic_segmentation - <ml3d.tf.models.randlanet.RandLANet object at 0x7f037d46b490>
INFO - 2021-05-13 16:30:09,055 - semantic_segmentation - Logging in file : ./logs/RandLANet_Van_tf/log_train_2021-05-13_16:30:09.txt
INFO - 2021-05-13 16:30:09,055 - van - Found 1 pointclouds for training
INFO - 2021-05-13 16:30:09,201 - van - Found 1 pointclouds for validation
INFO - 2021-05-13 16:30:09,258 - semantic_segmentation - Writing summary in train_log/00033_RandLANet_Van_tf.
INFO - 2021-05-13 16:30:09,259 - semantic_segmentation - Initializing from scratch.
INFO - 2021-05-13 16:30:09,259 - semantic_segmentation - === EPOCH 0/100 ===
training:   0%|                                                                   | 0/100 [00:00<?, ?it/s]2021-05-13 16:30:09.499249: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2021-05-13 16:30:09.661945: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2021-05-13 16:30:10.031454: E tensorflow/stream_executor/cuda/cuda_dnn.cc:318] Loaded runtime CuDNN library: 7.5.0 but source was compiled with: 7.6.4.  CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2021-05-13 16:30:10.031856: W ./tensorflow/stream_executor/stream.h:2049] attempting to perform DNN operation using StreamExecutor without DNN support
training:   0%|                                                                   | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "scripts/run_pipeline.py", line 134, in <module>
    main()
  File "scripts/run_pipeline.py", line 130, in main
    pipeline.run_train()
  File "/media/sam/grosso/OneDrive/ML/pointNet/Open3D-ML/ml3d/tf/pipelines/semantic_segmentation.py", line 251, in run_train
    results = model(inputs, training=True)
  File "/home/sam/anaconda3/envs/semanticKitti/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 985, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/media/sam/grosso/OneDrive/ML/pointNet/Open3D-ML/ml3d/tf/models/randlanet.py", line 241, in call
    f_encoder_i = self.forward_dilated_res_block(
  File "/media/sam/grosso/OneDrive/ML/pointNet/Open3D-ML/ml3d/tf/models/randlanet.py", line 205, in forward_dilated_res_block
    f_pc = m_conv2d(feature, training=self.training)
  File "/home/sam/anaconda3/envs/semanticKitti/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 985, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/media/sam/grosso/OneDrive/ML/pointNet/Open3D-ML/ml3d/tf/utils/helper_tf.py", line 55, in call
    x = self.batch_normalization(x, training=training)
  File "/home/sam/anaconda3/envs/semanticKitti/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 985, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "/home/sam/anaconda3/envs/semanticKitti/lib/python3.8/site-packages/tensorflow/python/keras/layers/normalization.py", line 720, in call
    outputs = self._fused_batch_norm(inputs, training=training)
  File "/home/sam/anaconda3/envs/semanticKitti/lib/python3.8/site-packages/tensorflow/python/keras/layers/normalization.py", line 576, in _fused_batch_norm
    output, mean, variance = tf_utils.smart_cond(training, train_op,
  File "/home/sam/anaconda3/envs/semanticKitti/lib/python3.8/site-packages/tensorflow/python/keras/utils/tf_utils.py", line 64, in smart_cond
    return smart_module.smart_cond(
  File "/home/sam/anaconda3/envs/semanticKitti/lib/python3.8/site-packages/tensorflow/python/framework/smart_cond.py", line 54, in smart_cond
    return true_fn()
  File "/home/sam/anaconda3/envs/semanticKitti/lib/python3.8/site-packages/tensorflow/python/keras/layers/normalization.py", line 542, in _fused_batch_norm_training
    return nn.fused_batch_norm(
  File "/home/sam/anaconda3/envs/semanticKitti/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "/home/sam/anaconda3/envs/semanticKitti/lib/python3.8/site-packages/tensorflow/python/ops/nn_impl.py", line 1637, in fused_batch_norm
    y, running_mean, running_var, _, _, _ = gen_nn_ops.fused_batch_norm_v3(
  File "/home/sam/anaconda3/envs/semanticKitti/lib/python3.8/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 4268, in fused_batch_norm_v3
    _ops.raise_from_not_ok_status(e, name)
  File "/home/sam/anaconda3/envs/semanticKitti/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 6843, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape ([2,45056,1,8]) [Op:FusedBatchNormV3]

Sam.

sanskar107 commented 3 years ago

Could you try with dev branch? All the latest fixes are in dev branch.

samux87 commented 3 years ago

I cloned the dev branch and typed: python -c "import open3d.ml.torch as ml3d" this is the output: ImportError: cannot import name 'furthest_point_sampling' from 'open3d.ml.torch.ops' (/home/sam/anaconda3/envs/semanticKitti/lib/python3.8/site-packages/open3d/ml/torch/ops/__init__.py)

Maybe I need to reinstall open3d in a different way? IDK.

Thank you again, Sam.

sanskar107 commented 3 years ago

Ahh, these ops are available in latest Open3D which will be released within a week. Either you can build Open3D from source, or as a workaround you can comment out every other model from ml3d/torch/models/__init__.py

hduonggithub commented 3 years ago

Hi @samux87, @sanskar107 ,

I am working with S3DIS dataset. Train and testing work well. If I want to inference for my own point cloud, do I have to define a custom dataset?

Any clue/hint/help is very welcome.

Thanks!

samux87 commented 3 years ago

Hi @hduonggithub, I have tried with this script https://github.com/intel-isl/Open3D-ML/blob/master/examples/vis_pred.py but the predictions are not good in my case. IDK if is it for the script or what else; maybe for you could be good.

Sam.

sanskar107 commented 3 years ago

@hduonggithub you don't have to define any custom dataset for inference. You can just import model and pipeline, then run pipeline.run_inference(data), here data is a dictionary, for e.g. data = { 'point' : pc, 'feat' : None }

samux87 commented 3 years ago

Hi @sanskar107, if I would like to use x,y,z + intensity in my dataset; what I have to change?

I tried to:



Did I do something wrong?

Thank you,
Samuele.
sanskar107 commented 3 years ago

@samux87 your changes should work. Can you print the shape of feat right before this assertion statement in randlanet.py?

sanskar107 commented 3 years ago

Closing due to inactivity.