isl-org / Open3D-ML

An extension of Open3D to address 3D Machine Learning tasks
Other
1.87k stars 321 forks source link

Fix pytorch 1.13 error due to 2D weights #601

Closed ssheorey closed 1 year ago

ssheorey commented 1 year ago

Regression when moving to pytorch 1.13.1. Also reported in #567 #580 #590

Tested change works with both PyTorch 1.13.1 and Tensorflow 2.8.4

$ python -mipdb scripts/run_pipeline.py torch -c ml3d/configs/randl
anet_semantickitti.yml --dataset.dataset_path /export/share/datasets/SemanticKITTI/  --pipeline SemanticSegmentation --dataset
.use_cache True --pipeline.num_workers 0 --pipeline.pin_memory False                                                          
/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/runpy.py:127: RuntimeWarning: 'ipdb.__main__' found in sys.modules after 
import of package 'ipdb', but prior to execution of 'ipdb.__main__'; this may result in unpredictable behaviour               
  warn(RuntimeWarning(msg))                                                                                                   
> /mnt/beegfs/mixed-tier/share/projects/open3d_ml/Open3D-ML_2/scripts/run_pipeline.py(1)<module>()                            
----> 1 import os                                                                                                             
      2 import argparse                                                                                                       
      3 import logging                                                                                                        

ipdb> c                                                                                                                       
Using external Open3D-ML in /home/ssheorey/projects/open3d_ml/Open3D-ML_2                                                     
regular arguments                                                                                                             
backend: gloo                                                                                                                 
batch_size: null                                                                                                              
cfg_dataset: null                                                                                                             
cfg_file: ml3d/configs/randlanet_semantickitti.yml                                                                            
cfg_model: null                                                                                                               
cfg_pipeline: null                                                                                                            
ckpt_path: null                                                                                                               
dataset: null                                                                                                                 
dataset_path: null                                                                                                            
device: cuda                                                                                                                  
device_ids:                                                                                                                   
- '0'                                                                                                                         
framework: torch                                                                                                              
host: localhost                                                                                                               
main_log_dir: null                                                                                                            
max_epochs: null                                                                                                              
mode: null                                                                                                                    
model: null                                                                                                                   
node_rank: 0                                                                                                                  
nodes: 1                                                                                                                      
pipeline: SemanticSegmentation                                                                                                
port: '12355'                                                                                                                 
seed: 0                                                                                                                       
split: train                                                                                                                  

extra arguments                                                                                                               
dataset.dataset_path: /export/share/datasets/SemanticKITTI/                                                                   
dataset.use_cache: 'True'      
pipeline.num_workers: '0'                                                                                                     
pipeline.pin_memory: 'False'                                                                                                  

INFO - 2023-06-13 12:57:02,092 - semantic_segmentation - DEVICE : cuda   
INFO - 2023-06-13 12:57:02,092 - semantic_segmentation - Logging in file : ./logs/RandLANet_SemanticKITTI_torch/log_train_2023
-06-13_12:57:02.txt           
INFO - 2023-06-13 12:57:02,645 - semantickitti - Found 19130 pointclouds for train                                            
INFO - 2023-06-13 12:57:06,678 - semantickitti - Found 4071 pointclouds for validation
INFO - 2023-06-13 12:57:08,010 - semantic_segmentation - Initializing from scratch.                                           
INFO - 2023-06-13 12:57:08,019 - semantic_segmentation - Writing summary in train_log/00013_RandLANet_SemanticKITTI_torch.    
INFO - 2023-06-13 12:57:08,023 - semantic_segmentation - Started training                                                     
INFO - 2023-06-13 12:57:08,024 - semantic_segmentation - === EPOCH 0/100 ===                                                  
training:   0%|                                                                                      | 0/4783 [00:02<?, ?it/s]
Traceback (most recent call last):  
  File "/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/site-packages/ipdb/__main__.py", line 323, in main               
    pdb._runscript(mainpyfile)                                                                                                
  File "/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/pdb.py", line 1573, in _runscript                                
    self.run(statement)
  File "/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/bdb.py", line 580, in run                                        
    exec(cmd, globals, locals)                                 
  File "<string>", line 1, in <module>                                                                                        
  File "/mnt/beegfs/mixed-tier/share/projects/open3d_ml/Open3D-ML_2/scripts/run_pipeline.py", line 1, in <module>             
    import os                                                                                                                 
  File "/mnt/beegfs/mixed-tier/share/projects/open3d_ml/Open3D-ML_2/scripts/run_pipeline.py", line 192, in main 
    pipeline.run_train()
File "/mnt/beegfs/mixed-tier/share/projects/open3d_ml/Open3D-ML_2/ml3d/torch/pipelines/semantic_segmentation.py", line 411, in run_train                                                                                                                                               loss, gt_labels, predict_scores = model.get_loss(
File "/mnt/beegfs/mixed-tier/share/projects/open3d_ml/Open3D-ML_2/ml3d/torch/models/randlanet.py", line 378, in get_loss                
loss = Loss.weighted_CrossEntropyLoss(scores, labels)                                                                               File "/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl          
return forward_call(*input, **kwargs)                                     
  File "/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1174, in forward               
return F.cross_entropy(input, target, weight=self.weight,                                                                           File "/home/ssheorey/miniconda3/envs/o3dml38/lib/python3.8/site-packages/torch/nn/functional.py", line 3026, in cross_entropy           
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)      
RuntimeError: weight tensor should be defined either for all 19 classes or no classes but got weight tensor of shape: [1, 19]  

This change is Reviewable