I also have a question about the "gamma_1" and "gamma_2" parameters in DeiT_3. These is no mention of the parameters in the paper. Could you please provide some explanations or experiment results?
The original code of DeiT or DeiT_3 doesn't add the positional embedding for the cls token.
I don't understand what you mean.
Gamma_1 and gamma_2 are layer scale. You can refer CaiT or DeiT III to learn it. DeiT doesn't include this.
Besides, DeiT III is just for inferencing. The weights are transferred from the original DeiT III repo. You can not get the DeiT III's results with this repo.
Thanks for your great work!
I have a few questions about the modification in DeiT_3.