Closed HJQjob closed 5 months ago
Thank you for your interest in our paper.
As you mentioned, the model structure and size are the same as the original DeiT. Instead, in our paper, we reduce the flops by reducing the number of tokens by layer rather than pruning the structure of the model.
Thank you.
Hello, thank you for your open-source contribution. Your checkpoint provides higher accuracy than the original model, but I have a question about GFLOPs: I downloaded the MCTF and MCTF Fast versions of the checkpoint you provided and found that the model size is consistent with the original Deit-T (22MB). After checking the dimensions of the weights in each layer, there was no change. Using the original model's tim. create() can perfectly fit your checkpoint, and this pruning result is very similar to unstructured pruning (except for a large number of zero values in the weight values). I would like to inquire how this pruning without changing the structure can reduce the model's Flops and achieve acceleration?