Closed monatis closed 1 year ago
I would like to help write this up. Can you give section headers?
hi @fire,
Thanks for offering a hand in this. My considerations for this issue are as follows:
Feel free to contribute for any of them.
clip.cpp
ProjectCLIP helps computers understand images and text together. It's used in many areas, like when you search for an image online or when a computer needs to describe what's in an image without any help.
Size: The size of this project is very small, it can use 85.6 MB multi-modal generative models. This means clip.cpp
can be used on devices that don't have a lot of storage space.
Startup Time: clip.cpp
starts up quickly. This is important because sometimes, programs take a long time to start, especially on servers and phones where starting up quickly is crucial.
A more appealing visualization, including a header with, for example, icon for license etc. --unfortunately I'm not a visual guy :D
Would like a video showing the typing command with a png in a terminal and a photo size by side. the result is returned.
My understanding is that this could be used with a blip caption model, such as ‘blip-base’, for zero-shot image labeling. Is that correct?
I think this project could gain a lot of traction if we can get ViT-bigG-14 and ViT-L-14/openai working. These are the clip models used for text encoding during sdxl training. (ref)
It would be amazing to get blip-base and blip2-2.7b working. I haven’t looked into the papers to find out which caption model they used.
this could be used with a blip caption model
Yes, BLIB and other large multimodal models are CLIP feature extractor + some bridging mechnism that projects CLIP hidden states to the language model embeddings + a large language model like OPT, Vicuna, T5 etc. This will be another project, see #31
we can get ViT-bigG-14 and ViT-L-14/openai working
Large OpenAI and Open CLIP variants ar already working in this project. But Stable Diffusion is a long story on its own. It's also another project that I want to use clip.cpp in, but yeah, level of traction is also important to devote time on all of zit.
Demonstrate model conversion, detail how to compile, explain the general API.
Talk about possible usage scenarios, especially the cold start issue.