Open JoakimHaurum opened 4 days ago
Hi, we initialize a random CLS token for these methods. Although there is no explicit supervision for the CLS token, we found that it can still effectively serve as the selector, as discussed in Appendix D of the paper.
When you tested the ATS and EvoViT pruning methods, how did you exactly incorporate the CLS token? As you mention the CLS token is not "natural" for dense tasks, but given you use a DeiT backbone, you should have it from there. Do you simply reuse the DeiT CLS token (even if it is not trained during the VIT Adapter dense training), or do you initialize a new random token after the dense training?