Can not achieve reported performance [PointPillar Kitti]

ArchipLab-LinfengZhang commented 2 years ago

log.txt log.txt log.txt Thanks for your error report and we appreciate it a lot.

Checklist

I have searched related issues but cannot get the expected help.
The bug has not been fixed in the latest version.

Describe the bug A clear and concise description of what the bug is. I hope to reimplement the experiments of training pointpillar 3-classes detection, but can not achieve the [performance]() reported in the log file. The performance is almost 10 AP lower. I think there should be something wrong... I have re-done this experiment many times but still can not solve it. By the way, I find that I have a much higher loss than the provided log file, and I wonder if this problem is related to #1339

Reproduction

What command or script did you run?

python ./tools/train.py configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class.py

Did you make any modifications on the code or config? Did you understand what you have modified? No changes in the config.
What dataset did you use? Kitti Environment
Please run python mmdet3d/utils/collect_env.py to collect necessary environment information and paste it here.

sys.platform: linux Python: 3.8.8 (default, Apr 13 2021, 19:58:26) [GCC 7.3.0] CUDA available: True GPU 0,1,2,3,4,5,6,7: GeForce RTX 2080 Ti CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 10.2, V10.2.89 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.9.0+cu102 PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2021.2-Product Build 20210312 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.2
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
CuDNN 7.6.5
Magma 2.5.2
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.10.0+cu102 OpenCV: 4.5.3 MMCV: 1.4.0 MMCV Compiler: GCC 7.5 MMCV CUDA Compiler: 10.2 MMDetection: 2.19.0 MMSegmentation: 0.20.0 MMDetection3D: 1.0.0rc0+9c7270d

You may add addition that may be helpful for locating the problem, such as
- How you installed PyTorch [e.g., pip, conda, source]
- Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

ZCMax commented 2 years ago

We have modified our coordinate systems in v1.0.0.rc0 branch, Are your training pkl files generated by branch before? If so, you need to update your pkl file using our provided script update_data_coords.py

ZCMax commented 2 years ago

Could you please run the following code and post the results here?

ann_file = 'data/kitti/kitti_dbinfos_train.pkl'
data = mmcv.load(ann_file)
for k in a.keys():
    item = a[k][0]
    print(item['box3d_lidar'])

ArchipLab-LinfengZhang commented 2 years ago

That's possible! I generate the training pkl long ago.

That's my results. Note that there is a problem in your provided codes (item = a[k][0], the variable a is not defined, I guess it should be item = data[k][0]). That's my inputs and outputs.

ann_file = 'data/kitti/kitti_dbinfos_train.pkl' data = mmcv.load(ann_file) for k in data.keys(): ... item = data[k][0] ... print(item['box3d_lidar'])

[ 8.731381 -1.8559176 -1.5996994 0.48 1.2 1.89 0.01 ] [13.510703 -0.98178 -1.69449 1.73 4.15 1.57 1.62 ] [34.377724 12.651429 -1.4623766 0.5 1.95 1.72 1.54 ] [65.3167 13.056136 -2.1648843 2.2 5.78 2.5 1.5 ] [26.70097 9.502761 -1.7758954 2.6 16.79 4.02 1.77 ] [91.24807 -0.7204353 -1.0393468 2.57 14.66 3.46 1.21 ] [23.538685 3.0487144 -1.497098 1.21 2.07 1.61 1.67 ] [19.519197 -6.6532526 -1.4515989 0.51 1.19 1.34 -3.14 ]

Thanks for your reply.

ZCMax commented 2 years ago

Yes, here is the right info for new coordinates:

[ 8.73138158 -1.85591748 -1.59969934  1.20000005  0.47999999  1.88999999
 -1.58079636]
[13.51070316 -0.98177999 -1.69448984  4.1500001   1.73000002  1.57000005
  3.09238911]
[34.37772205 12.65142872 -1.46237655  1.95000005  0.5         1.72000003
 -3.11079621]
[65.31670115 13.05613626 -2.1648843   5.78000021  2.20000005  2.5
 -3.07079625]
[26.70097047  9.50276183 -1.77589537 16.79000092  2.5999999   4.01999998
  2.94238925]
[91.24807075 -0.72043527 -1.03934685 14.65999985  2.56999993  3.46000004
 -2.78079629]
[23.53868667  3.04871427 -1.49709801  2.06999993  1.21000004  1.61000001
  3.04238915]
[19.51919761 -6.65325198 -1.45159893  1.19000006  0.50999999  1.34000003
  1.56920373]

you need to update your pkl file using our update_data_coords.py.

Besides, for better peformance, we add plane info during new pkl converter. Please refer to https://mmdetection3d.readthedocs.io/en/latest/datasets/kitti_det.html.

ArchipLab-LinfengZhang commented 2 years ago

After updating my datasets my problem has been solved. Thanks again!!

Zhangyongtao123 commented 2 years ago

After updating my datasets my problem has been solved. Thanks again!!

Hello, I have updated the pkl file(only kitti_dbinfos_train.pkl) using the update_data_coords.py, but I still can't achieve reported performance. Do have any other modification?

ZCMax commented 2 years ago

What's your current training performance?

Zhangyongtao123 commented 2 years ago

What's your current training performance?

As follows:

----------- AP11 Results ------------

Pedestrian AP11@0.50, 0.50, 0.50:
bbox AP11:0.0000, 0.0010, 0.0064
bev AP11:60.0762, 54.2552, 50.3532
3d AP11:0.0000, 0.0000, 0.0000
aos AP11:0.00, 0.00, 0.01
Pedestrian AP11@0.50, 0.25, 0.25:
bbox AP11:0.0000, 0.0010, 0.0064
bev AP11:71.5240, 67.8475, 64.0308
3d AP11:21.8397, 20.9342, 19.8065
aos AP11:0.00, 0.00, 0.01
Cyclist AP11@0.50, 0.50, 0.50:
bbox AP11:0.0000, 0.0041, 0.0041
bev AP11:78.1637, 63.1229, 59.8215
3d AP11:0.0000, 0.0000, 0.0000
aos AP11:0.00, 0.00, 0.00
Cyclist AP11@0.50, 0.25, 0.25:
bbox AP11:0.0000, 0.0041, 0.0041
bev AP11:81.9036, 68.4794, 65.6991
3d AP11:34.9415, 26.3673, 25.3158
aos AP11:0.00, 0.00, 0.00
Car AP11@0.70, 0.70, 0.70:
bbox AP11:87.8253, 76.6921, 75.5581
bev AP11:88.7598, 84.4438, 79.0230
3d AP11:81.6915, 65.8434, 64.2589
aos AP11:87.70, 76.28, 74.72
Car AP11@0.70, 0.50, 0.50:
bbox AP11:87.8253, 76.6921, 75.5581
bev AP11:90.6397, 89.6409, 88.8227 3d AP11:88.2804, 77.8367, 77.1427 aos AP11:87.70, 76.28, 74.72

Overall AP11@easy, moderate, hard: bbox AP11:29.2751, 25.5658, 25.1896 bev AP11:75.6666, 67.2740, 63.0659 3d AP11:27.2305, 21.9478, 21.4196 aos AP11:29.23, 25.43, 24.91

----------- AP40 Results ------------

Pedestrian AP40@0.50, 0.50, 0.50: bbox AP40:0.0000, 0.0000, 0.0017 bev AP40:59.7399, 53.2210, 49.1660 3d AP40:0.0000, 0.0000, 0.0000 aos AP40:0.00, 0.00, 0.00 Pedestrian AP40@0.50, 0.25, 0.25: bbox AP40:0.0000, 0.0000, 0.0017 bev AP40:72.0966, 68.0654, 64.2382 3d AP40:16.4732, 14.9519, 13.4765 aos AP40:0.00, 0.00, 0.00 Cyclist AP40@0.50, 0.50, 0.50: bbox AP40:0.0000, 0.0000, 0.0000 bev AP40:80.2887, 63.4230, 59.8364 3d AP40:0.0000, 0.0000, 0.0000 aos AP40:0.00, 0.00, 0.00 Cyclist AP40@0.50, 0.25, 0.25: bbox AP40:0.0000, 0.0000, 0.0000 bev AP40:84.3066, 68.7550, 65.7825 3d AP40:32.0423, 22.6880, 21.3693 aos AP40:0.00, 0.00, 0.00 Car AP40@0.70, 0.70, 0.70: bbox AP40:91.8606, 78.3506, 75.8679 bev AP40:91.4483, 85.5870, 83.0440 3d AP40:83.5863, 66.2167, 63.5449 aos AP40:91.70, 77.89, 75.00 Car AP40@0.70, 0.50, 0.50: bbox AP40:91.8606, 78.3506, 75.8679 bev AP40:95.7356, 92.6676, 90.0372 3d AP40:92.6666, 81.4687, 78.9910 aos AP40:91.70, 77.89, 75.00

Overall AP40@easy, moderate, hard: bbox AP40:30.6202, 26.1169, 25.2899 bev AP40:77.1590, 67.4103, 64.0155 3d AP40:27.8621, 22.0722, 21.1816 aos AP40:30.57, 25.96, 25.00

The command I run: CUDA_VISIBLE_DEVICES=2,3,4,5,6,7 ./tools/dist_train.sh ./configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class.py 6 --autoscale-lr

Env:

sys.platform: linux Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0] CUDA available: True GPU 0,1,2,3,4,5: GeForce RTX 2080 Ti CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 10.1, V10.1.243 GCC: gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 PyTorch: 1.6.0 PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
CuDNN 7.6.3
Magma 2.5.2
Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.7.0 OpenCV: 4.5.5 MMCV: 1.4.8 MMCV Compiler: GCC 7.4 MMCV CUDA Compiler: 10.1 MMDetection: 2.22.0 MMSegmentation: 0.22.0 MMDetection3D: 1.0.0rc1+333536f

Any problems?

open-mmlab / mmdetection3d

Can not achieve reported performance [PointPillar Kitti] #1340