isl-org / Open3D-ML

An extension of Open3D to address 3D Machine Learning tasks
Other
1.83k stars 313 forks source link

Weight Tensor Dimension Issue when training on SemanticKitti #567

Open SvenMala opened 1 year ago

SvenMala commented 1 year ago

Checklist

Describe the issue

I have been trying to train KPConv on Semantickitti using the Pytorch pipeline.

I use the default /home/user/Open3D-ML-master/ml3d/configs/kpconv_semantickitti.yml config file with adding "pin_memory: False" in pipeline.

The dataset was downloaded using the /home/user/Open3D-ML-master/scripts/download_datasets/download_semantickitti.sh script.

After successfully running through the preprocessing, I keep the getting the runtime error, that seemingly comes from an unexpected weight tensor dimension.

RuntimeError: weight tensor should be defined either for all 19 classes or no classes but got weight tensor of shape: [1, 19]

I get the same error using the RandLANet model.

Please give me any advice on how to deal with this issue.

Steps to reproduce the bug

import os
import open3d.ml as _ml3d
import open3d.ml.torch as ml3d

dataset = ml3d.datasets.SemanticKITTI(dataset_path='/Datasets/SemanticKitti', use_cache=True)

cfg_file = "/Open3D-ML-master/ml3d/configs/kpconv_semantickitti.yml"
cfg = _ml3d.utils.Config.load_from_file(cfg_file)

# create the model with random initialization.
model = ml3d.models.KPFCNN(**cfg.model)

pipeline = ml3d.pipelines.SemanticSegmentation(model=model, dataset=dataset,num_workers=1,device="cpu",**cfg.pipeline)

# prints training progress in the console.
pipeline.run_train()

Error message

File "/home/user/anaconda3/envs/pointcloud_pytorch/lib/python3.10/site-packages/spyder_kernels/py3compat.py", line 356, in compat_exec exec(code, globals, locals)

File "/home/user//Desktop/run_the_training.py", line 21, in pipeline.run_train()

File "/home/user//anaconda3/envs/pointcloud_pytorch/lib/python3.10/site-packages/open3d/_ml3d/torch/pipelines/semantic_segmentation.py", line 411, in run_train loss, gt_labels, predict_scores = model.get_loss(

File "/home/user//anaconda3/envs/pointcloud_pytorch/lib/python3.10/site-packages/open3d/_ml3d/torch/models/kpconv.py", line 339, in get_loss self.output_loss = Loss.weighted_CrossEntropyLoss(scores, labels)

File "/home/user//anaconda3/envs/pointcloud_pytorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs)

File "/home/user//anaconda3/envs/pointcloud_pytorch/lib/python3.10/site-packages/torch/nn/modules/loss.py", line 1164, in forward return F.cross_entropy(input, target, weight=self.weight,

File "/home/user//anaconda3/envs/pointcloud_pytorch/lib/python3.10/site-packages/torch/nn/functional.py", line 3014, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)

RuntimeError: weight tensor should be defined either for all 19 classes or no classes but got weight tensor of shape: [1, 19]

Expected behavior

I was expecting the model to train on the Semantic Kitti dataset. But, I keep getting the error.

Open3D, Python and System information

- Operating system: Ubuntu 20.04
- Python version: 3.10.6 
- Open3D version: 0.16.0
- Is this remote workstation?: no
- How did you install Open3D?: pip

Additional information

No response

ashishrana160796 commented 1 year ago

Hello @SvenMala, faced this similar issue while training RandLa-Net on my custom created dataset. I think the root cause for this issue is the shape of the class_weights parameter that is being used for training these models.

My first quick-fix is to drop out the class weight vector, since for my initial prototyping it was not of much use, and training worked for me (performance loss is there for highly unbalanced datasets).

Second, quick-fix is to update the CrossEntropyLoss in file /usr/local/lib/python3.8/dist-packages/open3d/_ml3d/torch/modules/losses/semseg_loss.py after keeping the class_weights as empty list in config file. The last line should somewhat looks like the below provided code snippet. Basically you are adding class weights into the regular CrossEntropyLoss function and making its functionality equivalent to a weighted one.

...
...
        else:
            weights = torch.tensor([1.0, 4.0, 10.0, 10.0, 5.0, 1.5], dtype=torch.float, device=device)
            self.weighted_CrossEntropyLoss = nn.CrossEntropyLoss(weight=weights, label_smoothing=0.14)