Open guojiajeremy opened 2 months ago
+1
@haoqiwang
Just came across this paper, looks very interesting. From what DINOv2 reports, the outlier tokens only show up with a sufficiently big model after sufficiently long training. I guess it wouldn't make much sense with ViT-S/B versions of DINOv2, but ViT-L or ViT-B from supervised or openCLIP could be interesting to see. Also, DINOv2 officially seems to have only released ViT-G trained from scratch, their ViT-L/B/S are distilled from ViT-G. I don't know if there's still outliers in those distilled models.
Thank you for your GREAT work.
The default ViT size of paper and released weights is ViT-g, which is too large for researchers and users with limited resourses. Could you please release the weights of smaller versions of ViT, such as ViT-Small/Base/Large?