gradusnikov / eclipse-chatgpt-plugin

An Eclipse plugin that integrates with ChatGPT
MIT License
55 stars 34 forks source link

Llama support? #31

Open hellfire7707 opened 4 months ago

hellfire7707 commented 4 months ago

Do you have plans to support other LLM models like Llama 3?

Or would it be easy to modify code implementing interface to OpenAI. I would like inerface using Ollama.

Any hints would be appreciated.

gradusnikov commented 4 months ago

Hi @hellfire7707

I have not used Ollama directly, but as far as I see it has the OpenAI API chat completions compatibility (https://github.com/ollama/ollama/blob/main/docs/openai.md), therefore it should work out of the box with this eclipse plugin. Alternatively you can use a model loader, such as LM Studio, that implements the OpenAI API endpoint. Then it's just a matter of adding a LLM model endpoint configuration in the AssistAI preferences.

boessu commented 4 months ago

Hello @hellfire7707

It is possible to register ollama as a chatgpt endpoint with http://localhost:11434/v1/chat/completions (turn off Visions and Function Calls, as this is not supported by ollama). I've got the best result with llama3. The code specific models (codellama, stable-code, starcoder2) seems to me as they've not enough semantic training for the prompts configured in this plugin here.

@gradusnikov I would be happy if you would have a look at ollama if you get the time. There are a lot of vs code AI plugins which support ollama beside of ChatGPT, and it seems as these plugins do produce better results than this plugin here (I don't know what's the difference to your plugin). Keeping the code private via a local offline model certainly has its value. Depending on the hardware you have, it's even remarkably faster.

jukofyork commented 4 months ago

llama.cpp has its own OpenAI compatible API endpoints now:

https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md

I've found Ollama to be incredibly buggy and it's often very hard to see what is actually getting sent to their server (eg: ask it to refactor some code before and after saying "hi" and clearing the context... This should give the same results with temperature=0 but it actually doesn't!!!). There was a bug where the system message wasn't getting passed to the API that went unnoticed for a few months too... Overall it isn't a good backend to use if you care about getting the most out of the models.

As for the models to use, then I've found wizard-lm-2 to be by far the best for C, C++ and Java. This company did some test recently:

https://old.reddit.com/r/LocalLLaMA/comments/1clfahu/we_benchmarked_30_llms_across_26_languages_using/

https://prollm.toqan.ai/leaderboard

Which also backs this up.

If you don't have much VRAM then deepseek-coder and phind-codllama are the next best I think (for C, C++ and Java anyway).

hellfire7707 commented 3 months ago

My appolgy for late reponse.

I used LM Studio as descried on the main page instead of ollama and it workd fine.

Thank you for all your efforts.