During MOT training the input resolution is set to 1280x1280 while the test size is 1560 (longer edge).
This mean that the input frames have an aspect-ratio (square) and a resolution (lower) compared to the test ones (rectangular aspect-ratio and bigger resolution).
I have tried to test with videos of the same resolution and aspect-ratio of training (1280x1280) but the performances were the worst.
My question is, how is it possible to obtain bad performances while maintaining the aspect-ratio and the same resolution of the training? Shouldn't the network perform better in that situation? If not, what is the reason (maybe I am missing some properties of the detector/transformer module)?
During MOT training the input resolution is set to 1280x1280 while the test size is 1560 (longer edge). This mean that the input frames have an aspect-ratio (square) and a resolution (lower) compared to the test ones (rectangular aspect-ratio and bigger resolution). I have tried to test with videos of the same resolution and aspect-ratio of training (1280x1280) but the performances were the worst.
My question is, how is it possible to obtain bad performances while maintaining the aspect-ratio and the same resolution of the training? Shouldn't the network perform better in that situation? If not, what is the reason (maybe I am missing some properties of the detector/transformer module)?