open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.3k stars 1.54k forks source link

there may have a error in py_sigmoid_focal_loss #575

Closed qfwysw closed 3 years ago

qfwysw commented 3 years ago

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. The issue has not been fixed in the latest version.

Describe the issue I try to understand the try to understand the theory of the focal_loss by reading the py_sigmoid_focal_loss. After reading that, I run the code below, but get the error. loss_cls = self.loss_cls(cls_score, labels, label_weights, avg_factor=num_total_samples) ll = py_sigmoid_focal_loss(cls_score, labels, label_weights, avg_factor=num_total_samples) Reproduction

  1. What command or script did you run?

    python tools/train.py configs/pointpillars/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class.py
  2. What config dir you run?

A placeholder for the config.
  1. Did you make any modifications on the code or config? Did you understand what you have modified?

    I add  the code below in mmdet3d/models/dense_heads/anchor3d_head.py loss_single.
    from mmdet.models.losses.focal_loss import py_sigmoid_focal_loss
    ll = py_sigmoid_focal_loss(cls_score, labels, label_weights, avg_factor=num_total_samples)
  2. What dataset did you use?

    kitti

    Environment

  3. Please run python mmdet3d/utils/collect_env.py to collect necessary environment infomation and paste it here. fatal: not a git repository (or any parent up to mount point /) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). sys.platform: linux Python: 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0] CUDA available: True GPU 0,1: GeForce RTX 3090 CUDA_HOME: /usr/local/cuda-11.2 NVCC: Build cuda_11.2.r11.2/compiler.29558016_0 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.7.1+cu110 PyTorch compiling details: PyTorch built with:

    • GCC 7.3
    • C++ Version: 201402
    • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
    • Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
    • OpenMP 201511 (a.k.a. OpenMP 4.5)
    • NNPACK is enabled
    • CPU capability usage: AVX2
    • CUDA Runtime 11.0
    • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80
    • CuDNN 8.0.5
    • Magma 2.5.2
    • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,

TorchVision: 0.8.2+cu110 OpenCV: 4.5.2 MMCV: 1.2.7 MMCV Compiler: GCC 7.5 MMCV CUDA Compiler: 11.2 MMDetection: 2.10.0 MMDetection3D: 0.13.0+

  1. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Results

If applicable, paste the related results here, e.g., what you expect and what you get.

RuntimeError: The size of tensor a (3) must match the size of tensor b (1928448) at non-singleton dimension 1
Tai-Wang commented 3 years ago

Please follow the issue template for 'Reimplementation Questions' to redescribe your question.

Wuziyi616 commented 3 years ago

I am not sure why you call py_sigmoid_focal_loss by yourself? We actually have self.cls_loss which is FocalLoss (see here in the config).

Also if you look at here, you can't directly call py_sigmoid_focal_loss. Instead, you need some pre-processing to the tensors.

qfwysw commented 3 years ago

Thanks for your solution, it solves the problem. However, there is another question, we know gt_labels of Kitti for donotcare is -1. It cause the new error in target = F.one_hot(target, num_classes=num_classes + 1).

Wuziyi616 commented 3 years ago

Errr, so first of all I want to ask, can you train the code without any modifications?

Wuziyi616 commented 3 years ago

I don't know why you want to call the loss function by yourself instead of calling self.cls_loss? Because I think self.cls_loss is also FocalLoss which is exactly what you want?

qfwysw commented 3 years ago

Emmmm, the truth is that I want to learn the code of pointpillar but the code jumps too many times. So I'm trying to pull the code out of mmcv. It's hard for me to write the focal_loss by cuda c, so I just want use the py_focal_loss to replace it.

qfwysw commented 3 years ago

Of course the code can run correctly without any modification.

Wuziyi616 commented 3 years ago

OK I understand, that's no problem. As for your question about -1 label, if you look into the cuda coda of focal loss here, they just ignore the -1 label by setting loss=0. And it seems no such operation in py_sigmoid_focal_loss.

In my opinion, this is indeed a bug. But as py_sigmoid_focal_loss is designed for debug, we don't expect it to work perfectly. I think focal loss is easy to understand and you can just use the original implementation? Or you can manually filter out -1 labels before calling py_sigmoid_focal_loss?

Wuziyi616 commented 3 years ago

Feel free to re-open this issue should you have any further questions :)