I notice that there is a "--tta" option in args. Do you use TTA in inference to derive the results reported in the paper?
I notice that you specified tricky "model-ema-decay" (e.g. 0.99984 for CSwin-tiny and 0.99992 for CSwin-base), do you use ema-model for inference to derive the results in the paper?
How do the two factors mentioned above impact the model performance?