open-mmlab / mmrotate

OpenMMLab Rotated Object Detection Toolbox and Benchmark
https://mmrotate.readthedocs.io/en/latest/
Apache License 2.0
1.88k stars 558 forks source link

[Bug] Bounding Box Loss Always Zero with Rotated RTMDet Model on New Dataset #1048

Closed FrancescoManigrass closed 4 months ago

FrancescoManigrass commented 4 months ago

Prerequisite

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

1.x branch https://github.com/open-mmlab/mmrotate/tree/1.x

Environment

sys.platform: win32 Python: 3.8.19 (default, Mar 20 2024, 19:55:45) [MSC v.1916 64 bit (AMD64)] CUDA available: True MUSA available: False numpy_random_seed: 2147483648 GPU 0: NVIDIA GeForce RTX 2080 Ti CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8 NVCC: Cuda compilation tools, release 11.8, V11.8.89 MSVC: Microsoft (R) C/C++ Optimizing Compiler versione 19.39.33523 per x64 GCC: n/a PyTorch: 1.8.0+cu111 PyTorch compiling details: PyTorch built with:

TorchVision: 0.9.0+cu111 OpenCV: 4.10.0 MMEngine: 0.10.4 MMRotate: 1.0.0rc1+1dc8d77

Reproduces the problem - code sample

I attempted to train the rotated_rtmdet_l-coco_pretrain-3x-x model on a new dataset by adjusting the number of classes in the settings. However, I observed that the bounding box loss (bbloss) is always zero. Upon debugging, I found that my model predicts all classes as background, which are then discarded, resulting in a loss of zero.

Steps to Reproduce:

Load the rotated_rtmdet_l-coco_pretrain-3x-x model. Modify the number of classes in the configuration to match the new dataset. Start training the model on the new dataset. Observe the bounding box loss during training. Observed Behavior:

The bounding box loss remains at zero throughout the training. The model predicts all classes as background, leading to a loss of zero. Expected Behavior:

The model should correctly predict the classes in the new dataset, and the bounding box loss should reflect the predictions and ground truth differences. Hyperparameters Tried:

Different learning rates Various batch sizes Adjusted weight decay and momentum parameters Additional Information:

I am confident that I am using the pre-trained weights from the COCO dataset. The class annotations in the new dataset have been verified to be correct. Possible Causes Considered:

Incorrect number of classes specified in the configuration Pre-trained weights not being properly loaded Data augmentation or preprocessing issues

Reproduces the problem - command or script

python tools/train.py --config configs/rotated_rtmdet/rotated_rtmdet_l-coco_pretrain-3x-dataset3.py

Reproduces the problem - error message

bbloss == 0

Additional information

No response