mzhaoshuai / RLCF

[ICLR 2024] Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models.
https://mzhaoshuai.github.io/RLCF/
Apache License 2.0
55 stars 1 forks source link

the weight of ViT/L 14 #6

Closed cht619 closed 3 months ago

cht619 commented 3 months ago

Thanks for your wonderful work. Can you share the weigh of ViT/L 14? since coop do not have ViT/L 14, dim=768

mzhaoshuai commented 3 months ago

Hi, thx for reaching out.

For TTA model, we do not use ViT-L/14. We use ViT-B/16.

image

For reward model, when using ViT-L/14, we use the original models from openai without further tuning.

mzhaoshuai commented 3 months ago

Close due to inactivity. Feel free to re-open.