Inference of Large Multimodal Models in C/C++
This is still work in progress and not ready for anything.
This repo implements LLaVA inference in C/C++ on top of clip.cpp and llama.cpp. Eventually, it will support inference of other large multimodal models, but LLaVA is chosen as a starting point.
clip.cpp
LLaVA
. Initially, it should be two-file format **one for the visual encoder and the other for LLaMA.instructblip
.