uzh-rpg / svit

Official implementation of "SViT: Revisiting Token Pruning for Object Detection and Instance Segmentation"
Apache License 2.0
23 stars 3 forks source link

Attention type in code #7

Closed King4819 closed 3 months ago

King4819 commented 3 months ago

Hi, I want to ask that whether the code implement global attention or deformable attention.

Since I want to calculate the FLOPs drop ratio according to the number of tokens that each ViT layer used, and the FLOPs of deformable attention is proportional to the number of tokens? O(N) ?

Thank you very much!

kaikai23 commented 3 months ago

Hi,

In terms of classification code, we used only global attention, which is applied on all the remaining tokens after pruning. You can easily write flops as a function of # of tokens, by replacing the '197' by the actual # of tokens in a vit layer, please refer to the calculations here.

For deformable attention in the detection code, I guess it is difficult to find a tool to calculate the flops since cuda implementation is involved. I just avoided calculating the flops there and reported the speed instead (frames per second).

King4819 commented 3 months ago

Thanks for your reply !!!