Open BalazsSzekeres opened 1 year ago
I face the exact same issue and can confirm that training works with single-class configs (car).
My environment setup as collected from mmdet3d/utils/collect_env.py
:
sys.platform: linux
Python: 3.8.15 (default, Nov 24 2022, 15:19:38) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: Tesla V100-SXM2-32GB
CUDA_HOME: /scratch/sbaratam/envs/openmmlab
NVCC: Cuda compilation tools, release 11.6, V11.6.124
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.13.0
PyTorch compiling details: PyTorch built with:
- GCC 9.3
- C++ Version: 201402
- Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 11.6
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
- CuDNN 8.3.2 (built against CUDA 11.5)
- Magma 2.6.1
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,
TorchVision: 0.14.0
OpenCV: 4.6.0
MMCV: 1.7.0
MMCV Compiler: GCC 9.3
MMCV CUDA Compiler: 11.6
MMDetection: 2.26.0
MMSegmentation: 0.29.1
MMDetection3D: 1.0.0rc5+7b493dc
spconv2.0: False
What I noticed is that both me and @TarunKumar1995-glitch use PyTorch version 1.13. I have downgraded to 1.9.1 with conda install pytorch==1.9.1 torchvision==0.10.1 torchaudio==0.9.1 cudatoolkit=11.3 -c pytorch -c conda-forge
, which solves the issue. It seems that latest PyTorch version is incompatible with the current package. What is the latest PyTorch version that is compatible?
Hi, thanks for your issue. We will test and make the package compatible with PyTorch 1.13.
I met the same problem. Would pytorch 1.12 work?
don't need to downgrade pytorch.
my solution is editing mmdetection3d/mmdet3d/models/dense_heads/train_mixins.py:140
if self.assign_per_class:
gt_per_cls = (gt_labels == i).cpu() # line 140
don't need to downgrade pytorch.
my solution is editing mmdetection3d/mmdet3d/models/dense_heads/train_mixins.py:140
if self.assign_per_class: gt_per_cls = (gt_labels == i).cpu() # line 140
thank you,so great,how do you find this?it is useful
Prerequisite
Task
I'm using the official example scripts/configs for the officially supported tasks/models/datasets.
Branch
master branch https://github.com/open-mmlab/mmdetection3d
Environment
sys.platform: win32 Python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 05:59:00) [MSC v.1929 64 bit (AMD64)] CUDA available: True GPU 0: NVIDIA GeForce RTX 3080 CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7 NVCC: Cuda compilation tools, release 11.7, V11.7.64 MSVC: Microsoft (R) C/C++ Optimizing Compiler Version 19.32.31332 for x64 GCC: n/a PyTorch: 1.13.0 PyTorch compiling details: PyTorch built with:
TorchVision: 0.14.0 OpenCV: 4.5.5 MMCV: 1.6.2 MMCV Compiler: MSVC 193231332 MMCV CUDA Compiler: 11.6 MMDetection: 2.25.3 MMSegmentation: 0.29.1 MMDetection3D: 1.0.0rc5+fcb4545 spconv2.0: False
Reproduces the problem - code sample
Reproduces the problem - command or script
Reproduces the problem - error message
Additional information
Interestingly, when running Kitti 3D Car PointPillars, the training works as expected. I have checked the differences between the two config files, and the only difference is the model, so the issue should be somewhere there.