raoyongming / DynamicViT

[NeurIPS 2021] [T-PAMI] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
https://dynamicvit.ivg-research.xyz/
MIT License
551 stars 69 forks source link

Some questions about your code #22

Closed leoozy closed 2 years ago

leoozy commented 2 years ago

Hello, thank you for your code. I spent some time reading your code carefully. But I still can not understand the following lines(https://github.com/raoyongming/DynamicViT/blob/84b4e2a9b1f11199bd1e2ff506969b0d64e6f55b/models/dyvit.py#L174). Could you please give me sone advices? Why abstract the max value from the atten?

raoyongming commented 2 years ago

Hi, thanks for your interest in our work.

This part of the code is our implementation of the Softmax function. Since we need to multiply the attention scores by the attention policy (attn_policy, Eq.11 in our NeurIPS paper), we cannot directly use PyTorch's implementation of Softmax. We compute the max_att values in line 174 to avoid overflow because softmax(X) = softmax(X-m), where m can be any constant. You may refer to the discussions here.

leoozy commented 2 years ago

Thank you for your rapid reply!