Regression when moving to pytorch 1.13.1.
Also reported in #567 #580 #590
Tested change works with both PyTorch 1.13.1 and Tensorflow 2.8.4
$ python -mipdb scripts/run_pipeline.py torch -c ml3d/configs/randl
anet_semantickitti.yml --dataset.dataset_path /export/share/datasets/SemanticKITTI/ --pipeline SemanticSegmentation --dataset
.use_cache True --pipeline.num_workers 0 --pipeline.pin_memory False
/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/runpy.py:127: RuntimeWarning: 'ipdb.__main__' found in sys.modules after
import of package 'ipdb', but prior to execution of 'ipdb.__main__'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
> /mnt/beegfs/mixed-tier/share/projects/open3d_ml/Open3D-ML_2/scripts/run_pipeline.py(1)<module>()
----> 1 import os
2 import argparse
3 import logging
ipdb> c
Using external Open3D-ML in /home/ssheorey/projects/open3d_ml/Open3D-ML_2
regular arguments
backend: gloo
batch_size: null
cfg_dataset: null
cfg_file: ml3d/configs/randlanet_semantickitti.yml
cfg_model: null
cfg_pipeline: null
ckpt_path: null
dataset: null
dataset_path: null
device: cuda
device_ids:
- '0'
framework: torch
host: localhost
main_log_dir: null
max_epochs: null
mode: null
model: null
node_rank: 0
nodes: 1
pipeline: SemanticSegmentation
port: '12355'
seed: 0
split: train
extra arguments
dataset.dataset_path: /export/share/datasets/SemanticKITTI/
dataset.use_cache: 'True'
pipeline.num_workers: '0'
pipeline.pin_memory: 'False'
INFO - 2023-06-13 12:57:02,092 - semantic_segmentation - DEVICE : cuda
INFO - 2023-06-13 12:57:02,092 - semantic_segmentation - Logging in file : ./logs/RandLANet_SemanticKITTI_torch/log_train_2023
-06-13_12:57:02.txt
INFO - 2023-06-13 12:57:02,645 - semantickitti - Found 19130 pointclouds for train
INFO - 2023-06-13 12:57:06,678 - semantickitti - Found 4071 pointclouds for validation
INFO - 2023-06-13 12:57:08,010 - semantic_segmentation - Initializing from scratch.
INFO - 2023-06-13 12:57:08,019 - semantic_segmentation - Writing summary in train_log/00013_RandLANet_SemanticKITTI_torch.
INFO - 2023-06-13 12:57:08,023 - semantic_segmentation - Started training
INFO - 2023-06-13 12:57:08,024 - semantic_segmentation - === EPOCH 0/100 ===
training: 0%| | 0/4783 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/site-packages/ipdb/__main__.py", line 323, in main
pdb._runscript(mainpyfile)
File "/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/pdb.py", line 1573, in _runscript
self.run(statement)
File "/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/bdb.py", line 580, in run
exec(cmd, globals, locals)
File "<string>", line 1, in <module>
File "/mnt/beegfs/mixed-tier/share/projects/open3d_ml/Open3D-ML_2/scripts/run_pipeline.py", line 1, in <module>
import os
File "/mnt/beegfs/mixed-tier/share/projects/open3d_ml/Open3D-ML_2/scripts/run_pipeline.py", line 192, in main
pipeline.run_train()
File "/mnt/beegfs/mixed-tier/share/projects/open3d_ml/Open3D-ML_2/ml3d/torch/pipelines/semantic_segmentation.py", line 411, in run_train loss, gt_labels, predict_scores = model.get_loss(
File "/mnt/beegfs/mixed-tier/share/projects/open3d_ml/Open3D-ML_2/ml3d/torch/models/randlanet.py", line 378, in get_loss
loss = Loss.weighted_CrossEntropyLoss(scores, labels) File "/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1174, in forward
return F.cross_entropy(input, target, weight=self.weight, File "/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/site-packages/torch/nn/functional.py", line 3026, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
RuntimeError: weight tensor should be defined either for all 19 classes or no classes but got weight tensor of shape: [1, 19]
Regression when moving to pytorch 1.13.1. Also reported in #567 #580 #590
Tested change works with both PyTorch 1.13.1 and Tensorflow 2.8.4
This change is