Multimodality Feature - Githubissues

While the basic structure for multimodality integration is there in the code, I cannot find a suitable model to run it with. Most models' projectors are either too low resolution (~400px), or the underlying LLM is too weak to be usable. The only (open source) multimodal LLM that has a high enough resolution, is smart enough, and is good enough at OCR seems to be OpenGVLab/InternVL, but that model is way too large to run on anything I have access to.

If new models come out that meet the above requirements, please let me know about them in this issue. Thanks!

kimjammer / Neuro

Multimodality Feature #4