microsoft / Cream

This is a collection of our NAS and Vision Transformer work.
MIT License
1.61k stars 220 forks source link

TinyCLIP HuggingFace integration #212

Closed FransHk closed 6 months ago

FransHk commented 6 months ago

Hi team,

I am integrating and comparing the brilliant set of TinyCLIP (ViT-based) architectures with the vanilla CLIP model in a number of language-enabled action recognition frameworks. Some of these AR models rely on HF model configurations, are there plans to release the TinyCLIP family to HF? Thanks!

wkcn commented 6 months ago

Hi @FransHk , thanks for your attention to our work!

Were you referring to integrating TinyCLIP into Hugging Face, like this example? https://huggingface.co/openai/clip-vit-large-patch14

FransHk commented 6 months ago

Hi, thanks for your reply. You are right. I am looking to export a TinyCLIP model in the HuggingFace 'ClipConfig' format so that I can load it into an existing codebase that expects this format.

For example, the code I am integrating TinyCLIP in loads vanilla CLIP like so:

from transformers import CLIPModel, CLIPConfig

configuration = CLIPConfig().from_pretrained(pretrained_model)
clip_model = CLIPModel.from_pretrained(pretrained_model, config=configuration)

The model that integrates CLIP for AR tasks is completely built around the HF-formatted CLIP, that's why I'm asking instead of re-implementing it for the open clip framework. Does this answer your question?

wkcn commented 6 months ago

I see. Thanks @FransHk !

I am trying to integrate TinyCLIP into HF. I rarely use transformers, and I have encountered issues with CLIPConfig, so it will take some time to integrate TinyCLIP :)

FransHk commented 6 months ago

Thank you @wkcn, looking forward to the results!

wkcn commented 6 months ago

Hi @FransHk, I have integrated some TinyCLIP-ViT models into HF.

https://huggingface.co/collections/wkcn/tinyclip-model-zoo-6581aa105311fe07be88cb0d

FransHk commented 6 months ago

Works like a charm, thanks!