haseeb-heaven / code-interpreter

An innovative open-source Code Interpreter with (GPT,Gemini,Claude,LLaMa) models.
https://pypi.org/project/open-code-interpreter/
MIT License
216 stars 40 forks source link

Support for Multiple Vision Models with easy Interface #19

Open sharoseali opened 3 days ago

sharoseali commented 3 days ago

HI @haseeb-heaven, first thanks for your great work for tying up all common LLMs with one string. Really appreciateable.

I was looking for Vision AI models. In the documentation, you have written that currently, it supports Vopensource and Vision AI models and Vision APIs, but is there any possibility to add any UI, may be a web page running on localhost to deal with LLms, like prompt writing , image uploading, or a drop-down to switch to LLMs and entering their API keys in text fields?

haseeb-heaven commented 3 days ago

Hi Currently it only support GPT Vision and Google AI Vision models and others are not added yet and about the interface it has only CLI interface but for vision models you need to modify it to have the UI for image upload or chat. The interface is simple to use for all users but for additionalal features like image upload and chat interface we can create new interface but it will take lots of re-designing the interpreter.py and interpreter_lib.py class

sharoseali commented 1 day ago

Thanks for the reply. Any plans to upgrade this with Gemini? Open-source vision models like LLAVA and GUI support will make CODE-Interpreter a more generic and interesting tool.

haseeb-heaven commented 1 day ago

This was supposed to be command line tool only for ease to use but for image we can have different branch for GUI to make it more usable.