microsoft / esvit

EsViT: Efficient self-supervised Vision Transformers
MIT License
408 stars 46 forks source link

[QUESTION] Results on correspondence learning #13

Closed tileb1 closed 3 years ago

tileb1 commented 3 years ago

Hello, I cannot seem to find in the paper which features are used for doing the correspondence matching in the appendix. Is it the last layer features (rough-grained) or the first layer features (fine-grained) or a combination of features at all depths (if so how is the combination?) ? Thanks!

ChunyuanLI commented 3 years ago

Only the last layer feature.