raoyongming / DynamicViT

[NeurIPS 2021] [T-PAMI] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
https://dynamicvit.ivg-research.xyz/
MIT License
551 stars 69 forks source link

The problem of giving up a fixed number of tokens during the inference stage. #32

Closed MooresS closed 1 year ago

MooresS commented 1 year ago

Thank you for your contribution. In the code, the ratio of abandoning tokens is 0.7, 0.7^2,0.7^3. Is this hyperparameter obtained by repeated experiments. Because the inference stage directly discards fixed tokens, it is different from the training stage.If it is not set well enough to be consistent with the training, the effect will be very poor. Is there any better way to set the hyperparameter. Thank you for your reply.

raoyongming commented 1 year ago

Hi @MooresS, we use a ratio loss during training to make sure the model is suitable for the target sparsification ratio (see Eq. 15 in our conference paper). Therefore, different DynamicViT models will have different optimal rho, and we usually directly use the rho during inference.