yzd-v / cls_KD

'NKD and USKD' (ICCV 2023) and 'ViTKD' (CVPRW 2024)
Apache License 2.0
207 stars 17 forks source link

Positional Embedding #10

Closed user3984 closed 1 year ago

user3984 commented 1 year ago

Thanks for your great work!

I have a few questions about the modification in DeiT_3.

  1. Why do you remove the positional embedding for the cls token?
  2. Do you simply omit the dist token and the positional embeddings for both tokens when transferring weights from DeiT?
user3984 commented 1 year ago

I also have a question about the "gamma_1" and "gamma_2" parameters in DeiT_3. These is no mention of the parameters in the paper. Could you please provide some explanations or experiment results?

yzd-v commented 1 year ago
  1. The original code of DeiT or DeiT_3 doesn't add the positional embedding for the cls token.
  2. I don't understand what you mean.
  3. Gamma_1 and gamma_2 are layer scale. You can refer CaiT or DeiT III to learn it. DeiT doesn't include this. Besides, DeiT III is just for inferencing. The weights are transferred from the original DeiT III repo. You can not get the DeiT III's results with this repo.