open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.16k stars 1.52k forks source link

RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. In tools/test.py #1461

Open ammaryasirnaich opened 2 years ago

ammaryasirnaich commented 2 years ago

Hi, I am testing the pre-trainined second model along with visualization running the command :

python /workspace/mmdetection3d/tools/test.py \
  /workspace/mmdetection3d/configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py\
  /workspace/working_dir/second_epoch_40.pth \
  --show --show-dir /workspace/working_dir/training_results

However, in the 000005 instance it gets a Runtime Error.

UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/TensorShape.cpp:2228.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
[                                                  ] 3/3769, 0.2 task/s, elapsed: 12s, ETA: 15084sTraceback (most recent call last):
  File "/workspace/mmdetection3d/tools/test.py", line 260, in <module>
    main()
  File "/workspace/mmdetection3d/tools/test.py", line 230, in main
    outputs = single_gpu_test(model, data_loader, args.show, args.show_dir)
  File "/workspace/mmdetection3d/mmdet3d/apis/test.py", line 48, in single_gpu_test
    model.module.show_results(
  File "/workspace/mmdetection3d/mmdet3d/models/detectors/base.py", line 120, in show_results
    show_result(
  File "/workspace/mmdetection3d/mmdet3d/core/visualizer/show_result.py", line 110, in show_result
    0, 255, size=(pred_labels.max() + 1, 3)) / 256
RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.

My working environment is:

CUDA available: True
GPU 0: NVIDIA GeForce RTX 3080
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.3.r11.3/compiler.29920130_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.11.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 
TorchVision: 0.12.0
OpenCV: 4.5.5
MMCV: 1.4.8
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.3
MMDetection: 2.23.0
MMSegmentation: 0.22.1
MMDetection3D: 1.0.0rc2+2eed522
spconv2.0: True

Will much appreciate for a help !

ApoorvaSuresh commented 2 years ago

Hi, I have the same error :( Did you find a solution for it? If so, could you please share it? Thanks in advance :)

ammaryasirnaich commented 2 years ago

Hi, I have the same error :( Did you find a solution for it? If so, could you please share it? Thanks in advance :)

Sorry @ApoorvaSuresh still waiting for help. I have no idea what is causing it !

Tai-Wang commented 2 years ago

Have you ever tried our pretrained models? Maybe your trained models are not good enough and produce no predictions, which causes the input.numel() == 0.

ammaryasirnaich commented 2 years ago

@Tai-Wang thanks for your response. I will try once again to re-check with the pre-trained model. However, the re-trained models show more than 72% mAP on Hard, medium, and easy modes.

Tai-Wang commented 2 years ago

You can add a breakpoint in the show function and have a look at why the input.numel() == 0. I guess it might be compatible for no predictions during evaluation while not for visualization.

ammaryasirnaich commented 2 years ago

@Tai-Wang , i am getting the same error with the pre-trained model


  File "mmdetection3d/tools/test.py", line 260, in <module>
    main()
  File "mmdetection3d/tools/test.py", line 230, in main
    outputs = single_gpu_test(model, data_loader, args.show, args.show_dir)
  File "/workspace/mmdetection3d/mmdet3d/apis/test.py", line 48, in single_gpu_test
    model.module.show_results(
  File "/workspace/mmdetection3d/mmdet3d/models/detectors/base.py", line 120, in show_results
    show_result(
  File "/workspace/mmdetection3d/mmdet3d/core/visualizer/show_result.py", line 110, in show_result
    0, 255, size=(pred_labels.max() + 1, 3)) / 256
RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.
ammaryasirnaich commented 2 years ago

One thing more, I think the pre-trained models must have been trained on spconv1.0. But I have spconv2.0 with my environment is it going to be some mismatch issue because as the model starts I also get the following messing in the terminal

load checkpoint from local path: /workspace/working_dir/hv_second_secfpn.pth
The model and loaded state dict do not match exactly
size mismatch for middle_encoder.conv_input.0.weight: copying a param with shape ('middle_encoder.conv_input.0.weight', torch.Size([16, 3, 3, 3, 4])) from checkpoint,the shape in current model is torch.Size([16, 3, 3, 3, 128]).
missing keys in source state_dict: voxel_encoder.vfe_layers.0.norm.weight, voxel_encoder.vfe_layers.0.norm.bias, voxel_encoder.vfe_layers.0.norm.running_mean, voxel_encoder.vfe_layers.0.norm.running_var, voxel_encoder.vfe_layers.0.linear.weight, voxel_encoder.vfe_layers.1.norm.weight, voxel_encoder.vfe_layers.1.norm.bias, voxel_encoder.vfe_layers.1.norm.running_mean, voxel_encoder.vfe_layers.1.norm.running_var, voxel_encoder.vfe_layers.1.linear.weight
Tai-Wang commented 2 years ago

One thing more, I think the pre-trained models must have been trained on spconv1.0. But I have spconv2.0 with my environment is it going to be some mismatch issue because as the model starts I also get the following messing in the terminal

load checkpoint from local path: /workspace/working_dir/hv_second_secfpn.pth
The model and loaded state dict do not match exactly
size mismatch for middle_encoder.conv_input.0.weight: copying a param with shape ('middle_encoder.conv_input.0.weight', torch.Size([16, 3, 3, 3, 4])) from checkpoint,the shape in current model is torch.Size([16, 3, 3, 3, 128]).
missing keys in source state_dict: voxel_encoder.vfe_layers.0.norm.weight, voxel_encoder.vfe_layers.0.norm.bias, voxel_encoder.vfe_layers.0.norm.running_mean, voxel_encoder.vfe_layers.0.norm.running_var, voxel_encoder.vfe_layers.0.linear.weight, voxel_encoder.vfe_layers.1.norm.weight, voxel_encoder.vfe_layers.1.norm.bias, voxel_encoder.vfe_layers.1.norm.running_mean, voxel_encoder.vfe_layers.1.norm.running_var, voxel_encoder.vfe_layers.1.linear.weight

The pretrained models of SECOND are not updated after the coordinate system refactoring. For now, you can try PointPillars with our provided models or train your own SECOND models with our provided configs.

ammaryasirnaich commented 2 years ago

But @Tai-Wan at the first instant got the mentioned (Posted title) error while training the own SECOND model with your provided configs!

ammaryasirnaich commented 2 years ago

@Tai-Wang , @ZCMax did you had a chance to further investigate the issue that I have used raised: 1 ) Gives the same error with the pre-trained model with the given config file 2) Gives the same error after retraining the model with the given config file

ammaryasirnaich commented 2 years ago

It work fine when i run it with the following command python tools/test.py workspace/mmdetection3d/configs/second/mmdetection3d/hv_second_secfpn_fp16_6x8_80e_kitti-3d-car.py /workspace/mmdetection3d/working_dir/hv_second_kitti-3d-car.pth --eval 'mAP' --eval-options 'show=True' 'out_dir=/workspace/mmdetection3d/working_dir/show_results'

jialeli1 commented 2 years ago

One thing more, I think the pre-trained models must have been trained on spconv1.0. But I have spconv2.0 with my environment is it going to be some mismatch issue because as the model starts I also get the following messing in the terminal

load checkpoint from local path: /workspace/working_dir/hv_second_secfpn.pth
The model and loaded state dict do not match exactly
size mismatch for middle_encoder.conv_input.0.weight: copying a param with shape ('middle_encoder.conv_input.0.weight', torch.Size([16, 3, 3, 3, 4])) from checkpoint,the shape in current model is torch.Size([16, 3, 3, 3, 128]).
missing keys in source state_dict: voxel_encoder.vfe_layers.0.norm.weight, voxel_encoder.vfe_layers.0.norm.bias, voxel_encoder.vfe_layers.0.norm.running_mean, voxel_encoder.vfe_layers.0.norm.running_var, voxel_encoder.vfe_layers.0.linear.weight, voxel_encoder.vfe_layers.1.norm.weight, voxel_encoder.vfe_layers.1.norm.bias, voxel_encoder.vfe_layers.1.norm.running_mean, voxel_encoder.vfe_layers.1.norm.running_var, voxel_encoder.vfe_layers.1.linear.weight

This ”mismatch“ problem also happened to me. How to fix it?

ammaryasirnaich commented 2 years ago

@jialeli1 actually i didn't solve my mismatch problem. It only solved the RuntimeError:max() issue. The pre-trained model for the config hv_second_secfpn_6x8_80e_kitti-3d-3class.py is working, however but if you retraining the model and do the evaluations the model keeps giving size mismatch for middle_encoder.conv_input.0.weight. I am also waiting for help

holtvogt commented 2 years ago

Is it possible to hotfix this by replacing the line in https://github.com/open-mmlab/mmdetection3d/blob/073f353cf21d31beabcbffa45bef10a6f81abff3/mmdet3d/core/visualizer/show_result.py#L106 with

if pred_labels is None or pred_labels.numel() == 0

?

Tracy-git commented 6 months ago

是否可以通过替换中的行来修复此问题

https://github.com/open-mmlab/mmdetection3d/blob/073f353cf21d31beabcbffa45bef10a6f81abff3/mmdet3d/core/visualizer/show_result.py#L106

if pred_labels is None or pred_labels.numel() == 0

i have solved this error with your suggestion ,thks so much