xingyizhou / GTR

Global Tracking Transformers, CVPR 2022
374 stars 57 forks source link

Question about inference resolution #36

Open pietro-nardelli opened 2 years ago

pietro-nardelli commented 2 years ago

During MOT training the input resolution is set to 1280x1280 while the test size is 1560 (longer edge). This mean that the input frames have an aspect-ratio (square) and a resolution (lower) compared to the test ones (rectangular aspect-ratio and bigger resolution). I have tried to test with videos of the same resolution and aspect-ratio of training (1280x1280) but the performances were the worst.

My question is, how is it possible to obtain bad performances while maintaining the aspect-ratio and the same resolution of the training? Shouldn't the network perform better in that situation? If not, what is the reason (maybe I am missing some properties of the detector/transformer module)?