monatis / clip.cpp

CLIP inference in plain C/C++ with no extra dependencies
MIT License
459 stars 31 forks source link

Write a better readme #4

Closed monatis closed 1 year ago

monatis commented 1 year ago

Demonstrate model conversion, detail how to compile, explain the general API.

Talk about possible usage scenarios, especially the cold start issue.

fire commented 1 year ago

I would like to help write this up. Can you give section headers?

monatis commented 1 year ago

hi @fire,

Thanks for offering a hand in this. My considerations for this issue are as follows:

Feel free to contribute for any of them.

fire commented 1 year ago

Motivation for the clip.cpp Project

CLIP helps computers understand images and text together. It's used in many areas, like when you search for an image online or when a computer needs to describe what's in an image without any help.

What's Special About This Project?

fire commented 1 year ago

A more appealing visualization, including a header with, for example, icon for license etc. --unfortunately I'm not a visual guy :D

Would like a video showing the typing command with a png in a terminal and a photo size by side. the result is returned.

Kwisss commented 1 year ago

My understanding is that this could be used with a blip caption model, such as ‘blip-base’, for zero-shot image labeling. Is that correct?

I think this project could gain a lot of traction if we can get ViT-bigG-14 and ViT-L-14/openai working. These are the clip models used for text encoding during sdxl training. (ref)

It would be amazing to get blip-base and blip2-2.7b working. I haven’t looked into the papers to find out which caption model they used.

monatis commented 1 year ago

this could be used with a blip caption model

Yes, BLIB and other large multimodal models are CLIP feature extractor + some bridging mechnism that projects CLIP hidden states to the language model embeddings + a large language model like OPT, Vicuna, T5 etc. This will be another project, see #31

we can get ViT-bigG-14 and ViT-L-14/openai working

Large OpenAI and Open CLIP variants ar already working in this project. But Stable Diffusion is a long story on its own. It's also another project that I want to use clip.cpp in, but yeah, level of traction is also important to devote time on all of zit.