xxxnell / how-do-vits-work

(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"
https://arxiv.org/abs/2202.06709
Apache License 2.0
798 stars 77 forks source link

Hi,something about object detection... #20

Closed ross-Hr closed 1 year ago

ross-Hr commented 1 year ago

Hi, I am ross. Excellent work !!! The experiments are basically classification problems.

Does the analysis result change much if the task switch to the object detection?

xxxnell commented 1 year ago

Hi ross, thank you for the question. As mentioned in Discussion section, we believe that self-attention can significantly improve the results in dense prediction tasks and that the key results remain the same. But we leave a detailed investigation for future work. In fact, I am contemplating whether to analyze the characteristics of ViT in dense prediction tasks as a follow-up work. Anyway, thank you for the kind words!