patrickjohncyh / fashion-clip

FashionCLIP is a CLIP-like model fine-tuned for the fashion domain.
MIT License
292 stars 34 forks source link

Using the embedding model in sentence_transformers and/or export to ONNX or TorchScript #26

Closed saskiabosma closed 5 months ago

saskiabosma commented 8 months ago

Hello !

I'm using this model as a text encoder. Since it is based on a CLIP-ViT-B-32 model, I was hoping that I could be able to use the sentence_transformers library (which explicitly supports this model family) for using it in OpenSearch for scalable vector search and/or exporting the model to ONNX or TorchScript formats.

(I'm looking for my ONNX/TorchScript model to take text as input, not processed tokens, so the usual export isn't adapted to my use case.)

I get the error "'CLIPConfig' object has no attribute 'hidden_size’" at the config.json load step when I try to load the FashionCLIP model in sentence_transformers. I checked the config.json of CLIP-ViT-B-32 model and, like FashionCLIP, it has an "hidden_size" field at the same nesting level. Does anyone have an idea of what could be blocking ? Any other suggestions for exporting a FashionCLIP text embedder to ONNX/TorchScript would also be very welcome !

vinid commented 8 months ago

Hi!

Good question. I will have to take a look.

Not sure if this could be the issue, but:

1) from a quick look, it seems like sentence transformers supports openai-derived models. Not sure if the fact that our model backbone is LAION CLIP makes a difference.

2) I wonder if the repo should be organized in a way that is similar to the one here https://huggingface.co/sentence-transformers/clip-ViT-B-32.

Just speculating, I'll try to give it a shot over the weekend! Let me know if you solve this!

vinid commented 8 months ago

Ok take a look at this: https://colab.research.google.com/drive/1BHNgrjax-4r7DH3sZU56yROJVrKZ7y3c?usp=sharing

I have downloaded both models (sentence-transformers' CLIP and FashionCLIP). I have then replicated the directory structure we see in CLIP in FashionCLIP. And I was able to load fashion-clip into sentence transformers.

Please double-check what I did, because I don't know if this actually makes sense (it might be a good idea to open an issue on sentence-transformers to see if someone can confirm this is the correct procedure).

If this works we can probably push the new folder structure online into another repo (ping @patrickjohncyh)

saskiabosma commented 8 months ago

It works on my side too ! You don't even need to change the repo structure, if we replace the subfolder reference by "path": ".", in modules.json (and copy modules.json and config_sentence_transformers.json in the main directory where the FashionCLIP artifacts are downloaded), it loads correctly. I also checked that it works with more recent versions of Pytorch and Transformers (2.0.1 and 4.26.1 respectively).

I will check sentence-transformers for official guidance on this, thanks for finding the fix !

ETA: Issue link.

vinid commented 8 months ago

Nice! Let's wait for confirmation on their side but I think this is the correct approach!