patrickjohncyh / fashion-clip

FashionCLIP is a CLIP-like model fine-tuned for the fashion domain.
MIT License
293 stars 34 forks source link

Multi-Lingual Clip #25

Closed yaman closed 8 months ago

yaman commented 8 months ago

Hi @vinid, what would happen if we have used a multilingual vit model like: clip-ViT-B-32-multilingual to run the same training on farfetch dataset?

I have a use case of using fashionclip on 6 different languages.

vinid commented 8 months ago

Hello!

It could work. Albeit, it depends on how much you tune the text encoder. What I would do is tuning it by freezing the text encoder part, in this way, embedded representations remain untouched.

You could also try to translate some examples and see how that works!

yaman commented 8 months ago

I think I need to learn more about llm training. Do you have any blog post or other materials to recommend for retraining fashion-clip for multilingual inference?

vinid commented 8 months ago

Of course! A couple of starting points that might be useful to get into CLIP code and CLIP training:

Some blog posts:

yaman commented 8 months ago

You're terrific, @vinid!

I stumbled upon your article last week; it's in an open tab and on my must-read list. I skimmed through it and appreciate how you've distilled complex fundamentals so effortlessly. It's like a CLIP for dummies :D Thanks for the article.

By the way, I'm crafting a Rust version of fashion-clip (despite being new to Rust) using onnx-runtime, with your model converted via Hugging Face's Optimum (not quantized though) for rapid embedding creation. Here it is; RustEmbed.

I'd value your feedback on whether I've utilized the fashion-clip model correctly and if RustEmbed piques your interest.

vinid commented 8 months ago

Thanks so much!

RustEmbed looks great! I can't give much feedback cause I don't know Rust, but I love the idea! I have added it to our readme: https://github.com/patrickjohncyh/fashion-clip/blob/master/README.md#fun-related-projects