hustvl / MIMDet

[ICCV 2023] You Only Look at One Partial Sequence
https://arxiv.org/abs/2204.02964
MIT License
330 stars 30 forks source link

Could you provide a model of 768 dimension output image features? #6

Closed Pter61 closed 2 years ago

Pter61 commented 2 years ago

There are many works that require image embeddings that have the same dimensions as text embeddings (e.g. BERT-base(768 dim), BERT-large(1024 dim). However, I noticed that you only provided the model of 1024 dimensions; therefore, could you provide more models of 768 dimensions?

Yuxin-CV commented 2 years ago

There are many works that require image embeddings that have the same dimensions as text embeddings (e.g. BERT-base(768 dim), BERT-large(1024 dim). However, I noticed that you only provided the model of 1024 dimensions; therefore, could you provide more models of 768 dimensions?

Hi. The embedding dim of all the vit base models is 768.

Pter61 commented 2 years ago

There are many works that require image embeddings that have the same dimensions as text embeddings (e.g. BERT-base(768 dim), BERT-large(1024 dim). However, I noticed that you only provided the model of 1024 dimensions; therefore, could you provide more models of 768 dimensions?

Hi. The embedding dim of all the vit base models is 768.

OK! Thank you!