lyuwenyu / RT-DETR

[CVPR 2024] Official RT-DETR (RTDETR paddle pytorch), Real-Time DEtection TRansformer, DETRs Beat YOLOs on Real-time Object Detection. 🔥 🔥 🔥
Apache License 2.0
2.31k stars 258 forks source link

Fix Inconsistency Between Training and Inference Caused by Multi-Scale Input Handling for Rectangular Inputs. #422

Open Ryanshuai opened 1 month ago

Ryanshuai commented 1 month ago

Problem: Inconsistency issues identified between the training and inference phases related to how multi-scale inputs are handled.

Details: During training, if the input image's height does not equal its width, our multi-scale process adjusts the image to be square. However, during inference, the inputs are maintained as rectangles. This inconsistency has led to a substantial degradation in model performance, with a decrease in mean Average Precision (mAP) by approximately 20% on our company's datasets.

Modifications:

  1. Multi-Scale Feature Interpolation: During training with multi-scale enabled, input images will be resized to match sz, whichever is larger in width or height.
  2. HybridEncoder Adjustments: In the HybridEncoder, bilinear interpolation is used for better feature alignment since the scale change is not an integer.