mlvlab / MCTF

Official implementation of CVPR 2024 paper "Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers".
MIT License
27 stars 1 forks source link

Regarding the issue of GFLOPs and acceleration #1

Closed HJQjob closed 5 months ago

HJQjob commented 5 months ago

Hello, thank you for your open-source contribution. Your checkpoint provides higher accuracy than the original model, but I have a question about GFLOPs: I downloaded the MCTF and MCTF Fast versions of the checkpoint you provided and found that the model size is consistent with the original Deit-T (22MB). After checking the dimensions of the weights in each layer, there was no change. Using the original model's tim. create() can perfectly fit your checkpoint, and this pruning result is very similar to unstructured pruning (except for a large number of zero values in the weight values). I would like to inquire how this pruning without changing the structure can reduce the model's Flops and achieve acceleration?

pizard commented 5 months ago

Thank you for your interest in our paper.

As you mentioned, the model structure and size are the same as the original DeiT. Instead, in our paper, we reduce the flops by reducing the number of tokens by layer rather than pruning the structure of the model.

Thank you.