Closed BinhuiXie closed 1 year ago
Hi @wlin-at
Excellent works!
Here are some questions about the results in Table 1. In your paper, the
clean
refers to the performance of the model on the original validation set. Specifically, for K400 & SSv2, the results are 75.32 and 66.36, respectively. But the results of VideoSwin repo give higher results.Could you help me out? I really appreciate any help you can provide.
Hi thanks for the interest in the work! VideoSwin uses 4x3 = 12 views during inference and the final score is computed as the average score over all the views. In our inference, we only take 1x1 view (center crop, uniformly sample one clip) for inference on clean, and for test time adaptation for efficient implementation. The implementation details are given in both papers.
thanks a lot
Hi @wlin-at
Excellent works!
Here are some questions about the results in Table 1. In your paper, the
clean
refers to the performance of the model on the original validation set. Specifically, for K400 & SSv2, the results are 75.32 and 66.36, respectively. But the results of VideoSwin repo give higher results.Could you help me out? I really appreciate any help you can provide.