Problem:
Inconsistency issues identified between the training and inference phases related to how multi-scale inputs are handled.
Details:
During training, if the input image's height does not equal its width, our multi-scale process adjusts the image to be square. However, during inference, the inputs are maintained as rectangles. This inconsistency has led to a substantial degradation in model performance, with a decrease in mean Average Precision (mAP) by approximately 20% on our company's datasets.
Modifications:
Multi-Scale Feature Interpolation: During training with multi-scale enabled, input images will be resized to match sz, whichever is larger in width or height.
HybridEncoder Adjustments: In the HybridEncoder, bilinear interpolation is used for better feature alignment since the scale change is not an integer.
Problem: Inconsistency issues identified between the training and inference phases related to how multi-scale inputs are handled.
Details: During training, if the input image's height does not equal its width, our multi-scale process adjusts the image to be square. However, during inference, the inputs are maintained as rectangles. This inconsistency has led to a substantial degradation in model performance, with a decrease in mean Average Precision (mAP) by approximately 20% on our company's datasets.
Modifications: