Open 1171000410 opened 2 years ago
No, the resulting difference would be negligible.
Hi, I would like to add a question. After reading your code, I wonder if token reorganization is used for only three layers in the test phase of Deit-S, but for all layers in the training phase.
line 398-399 in evit.py
if not isinstance(keep_rate, (tuple, list)): keep_rate = (keep_rate, ) * self.depth
Thanks for your question.
No, it is for only three layers. The keep_rate
you mentioned is for controlling the keep rate during training and is different from the keep rate used in inference (the self.keep_rate
in the Attention
is the real keep rate, while the keep_rate
you mentioned will be None
in inference). Another way to control the number of kept tokens is to change tokens
. See also line 208: https://github.com/youweiliang/evit/blob/29c7f2a67192eda0d2957402228065581a071bd5/evit.py#L208
In other words, it provides many ways to control keep rate in training/inference, and it is up to the users how to control it. But the default is using token reorganization in three layers.
Hello, I would like to ask if the warmup strategy is not used, but the keep rate is directly set to the target value, will the experimental results differ greatly?