Closed paul-gauthier closed 2 months ago
Why not have a unified API that you could provide then a plugin system to integrate other LLMS, then you can just provide gpt-3.5 and gpt-4 plugins officially yourself.
Per the FAQ and info above in the issue, there are already some useful hooks for connecting to other LLMs. A number of users have been using them to experiment with local LLMs. Most reports haven't been very enthusiastic about how they perform with aider compared to GPT-3.5/4.
Aider does have a modular system for building different "coder" backends, allowing customization of prompts and edit formats.
So all the raw materials seem to be available to get aider working with specific new models. I'm always happy to advise as best I can. And if you see evidence that a particular model has the potential to work well with aider, that would exciting news.
Great work, has anyone confirmed a successful usage with other local LLMs? It's not about cheapness, we still don't have access to ChatGPT API nor being able to pay for any alternatives.
hello! great news! there is another contender in the ecosystem of open source llm-coders
Yesterday I tested quantized version https://huggingface.co/TheBloke/NewHope-GGML (The bloke also released GPTQ version). Running in local using ooba textgen and activating opeanai extension. What I read is that is a llama2 model finetuned similar to wizardcoder
Seems to be better than wizardcoder but still needs effort to adjust prompts to run and be usable. Upsss... The group that originally released the model had to remove it because they realized that some data with which the quality was evaluated had slipped into the data with which they trained the model, so that they were giving comparative results that they weren't real. Even so, the quantized model of the_block is still there and can be downloaded and tested.
The NewHope model was retracted because it was contaminated with test data causing overfit.
https://twitter.com/mathemagic1an/status/1686814347287486464?s=46&t=hIokEbug9Pr72tQFuXVULA
So far none of the models come close to gpt and cannot follow instructions well enough to work with aider.
@aldoyh
Hey @paul-gauthier, while this question isn’t directly related to using other LLMs, I was wondering your advice for where to poke around to embed additional context into the prompt.
My friend and I are putting together a context embedding for some relevant up to date developer documentation and would love to try aider in conjunction with that context.
Thanks for this tool!
To reply I had to research that and it's my answer.
No, didn't know about it and going to check it out.. but even if I deployed a LocalAI, isn't there a way to make Aider
look at it?
@aldoyh Have a look at my comments in https://github.com/paul-gauthier/aider/issues/138
Appreciate your work on this project. This represents one of the biggest missing pieces between LLM and real code writing utility, so well done.
I read the bits about how hard it is to add new models, I just want to request you take a look at Claude 2. v2 added some big improvements on the code side, and while it is still dumber than GPT-4, it has a more recent knowledge cutoff. Plus massive context.
I understand you are working around context limitations with ctags, but it could be interesting to see if there is an advantage to being able to load the entire project in context with Claude. For example, it may be better at answering high level questions, or writing features that are described in more abstract terms. But regardless, I think that Claude is hot on the heels of GPT-4, and if the reporting on it being a 52B model is true then it is already significantly smarter (pound for pound).
Just my 2c anyway
I agree that the Claude models sound like they are the most likely to be capable of working well with aider. I have been waiting for an api key for months unfortunately. My impression is that it is very difficult to get a key, which limits the benefits of integrating Claude into aider. Not many folks could use it.
Paul, perhaps try openrouter, which seem sidestep the key issue, and give access to claude directly..
On Mon, Aug 14, 2023, 6:59 AM paul-gauthier @.***> wrote:
I agree that the Claude models sound like they are the most likely to be capable of working well with aider. I have been waiting for an api key for months unfortunately. My impression is that it is very difficult to get a key, which limits the benefits of integrating Claude into aider. Not many folks could use it.
— Reply to this email directly, view it on GitHub https://github.com/paul-gauthier/aider/issues/172#issuecomment-1677112183, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAW75LIGJJWEWORRNCA4TY3XVIAI5ANCNFSM6AAAAAA3BOLHYA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Paul, perhaps try openrouter, which seem sidestep the key issue, and give access to claude directly..
Yes, I am aware of openrouter. But that is a confusing extra layer to explain to users. Most users won't have direct Claude api access. And I won't be able to test aider directly against the Claude api. It's all sort of workable, but far from ideal.
Paul, perhaps try openrouter, which seem sidestep the key issue, and give access to claude directly..
Yes, I am aware of openrouter. But that is a confusing extra layer to explain to users. Most users won't have direct Claude api access. And I won't be able to test aider directly against the Claude api. It's all sort of workable, but far from ideal.
I like the idea of an easily extendable system, e.g a flag (--bot llama5
) that exports a class with this structure:
export default class LLaMA5 {
requirements = [
{
id: "apiKey",
name: "API Key",
type: "string",
required: true,
}
];
constructor({apiKey}){
}
createConversation() --→ ConvoParams
sendMessage(message, {conversation: ConvoParams, progress, done})
// Other optional methods such as deleteConversation, deleteMessage, editMessage, retryMessage, etc
}
This would be easily inspectable by Aider to check if this bot supports retrying, editing, etc as well as supporting the required parameters.
When it comes to the coding capabilities of local LLMs, I believe that the HumanEval (pass@1) is the most important metric.
The leaderboard lists "Starcoder-16b" as the best open model with a score of 0.336 compared to GPT-4 score of 0.67.
But we also have 2 GPT-3.5 models at 0.48/0.46. But here's the thing, There's also the "WizardLM-70B-V1.0" which no one has added to the leaderboards yet but it actually has a higher score than GPT-3.5 at 0.506.
I don't have a machine powerful enough to run it but I think that with minor tweaking it should perform as well as GPT-3.5
All this being said, I'm not a dev and I haven't tested any of this and honestly don't fully understand all the steps that autonomous agents like Aider take to get it to work.
Just though that I'd mention it in case it's useful to someone.
And Paul, great work with everything here. Really cool
Edit
There's also the WizardCoder-15B-V1.0 with a score of 0.573 which was what I originally came to inform about but somehow forgot along the way while checking sources.
I agree that the Claude models sound like they are the most likely to be capable of working well with aider. I have been waiting for an api key for months unfortunately. My impression is that it is very difficult to get a key, which limits the benefits of integrating Claude into aider. Not many folks could use it.
Hey Paul,
I'd be happy to lend you my API key to use for testing. There's a max of 1 call at a time, so if you can deal with that limitation - all good!
I'd be happy to lend you my API key to use for testing.
Thanks @JamesSKR. I have a loaner API key already. But again, so few people have Claude API access that it's not going to be very impactful to get aider working with Claude. Almost no one could use it. I definitely want to experiment with Claude, but it's not super high priority right now for that reason.
Adding support for the recently released Code llama (perhaps using cria?) would be very interesting imo, what do you think @paul-gauthier?
Hi Paul, thank you for such a great project. Love what you've done so far. I was also wondering if you've tested the PalM API from Google, just wondering if its any good?
@samuelmukoti I tested the palm models a little while working on the openrouter integration - they were ok, but similar to lama needed a bit of coaxing to output responses in a format aider would understand.
Just tested out the main branch with text-generation-webui's openAI API endpoint and it worked right away.
Here it is with Llamacoder:
Wow that’s exciting.. hope you can conduct further tests and share how the performance is compared to gpt3.5
thanks for sharing
I wonder if there's a way to get ctags working 🤔
@sammcj Have you tried asking it to edit code?
@sammcj, how did you setup the API for the Llamacoder? I am interested in giving this a try. Thanks
ollama is by far the easiest way!!
On Fri, Aug 25, 2023 at 17:27 John Vaughan @.***> wrote:
@sammcj https://github.com/sammcj, how did you setup the API for the Llamacoder? I am interested in giving this a try. Thanks
— Reply to this email directly, view it on GitHub https://github.com/paul-gauthier/aider/issues/172#issuecomment-1693954446, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATHQYAGIQ3Z4C7XYOQJEZA3XXEKEPANCNFSM6AAAAAA3BOLHYA . You are receiving this because you commented.Message ID: @.***>
@paul-gauthier Did you have a sample file and prompt that you’d like to provide to compare it to something you’d run against gpt3.5?? I can try it out and provide the results.
@jdvaugha I have a heap of different LLM tools on my home server, but the one I seem to use the most is https://github.com/oobabooga/text-generation-webui, however as mentioned Ollama is a very easy way to get started.
@sammcj Try the tests in the Examples folder. Here is one of them https://github.com/paul-gauthier/aider/blob/main/examples/hello-world-flask.md
FYI You can use llama.cpp to run local models with an openai compatible server with no-or-little code modifications
https://github.com/ggerganov/llama.cpp/discussions/795
I've yet to try but I'm excited to try this with llama code
Just set the OPENAI_API_BASE
environment variable.
Here is how I run it I have a script called server.sh to fire up the server
export HOST=0.0.0.0 &&python3 -m llama_cpp.server --n_threads 4 --model models/wizardcoder-python-34b-v1.0.Q2_K.gguf --n_ctx 16384
Then run aider from bash aider --openai-api-base=http://192.168.0.101:8000/v1 --openai-api-key=dummy --model=gpt-3.5-turbo --edit-format whole -v calculator.c --no-pretty --verbose
You may want to change the model in the line above to --model=gpt-3.5-turbo-16k to take advantage of the larger context window
@apcameron
Can you try this with the new Code Llama model?
"The Code Llama models provide stable generations with up to 100,000 tokens of context. All models are trained on sequences of 16,000 tokens and show improvements on inputs with up to 100,000 tokens."
https://ai.meta.com/blog/code-llama-large-language-model-coding/
https://www.geeky-gadgets.com/how-to-install-code-llama-locally/
@marvin-hansen I have tried it with https://huggingface.co/TheBloke/CodeLlama-34B-Instruct-GGUF but I found that the Wizardcoder was better for me. You need to try the different models and see which is best for your use case.
I wonder if there's a way to get ctags working 🤔
change the model to gpt-4 maybe
💪😃Got it working with oobabooga text web UI using --share link as the API replacement. Model: TheBloke_WizardCoder-Python-34B-V1.0-GPTQ
Can you share a small video on how to?
Hi @xb3sox, would you mind showing the settings you used for the API using oobabooga? I keep getting an error.
Error: `esponse_line raise error.APIError( openai.error.APIError: HTTP code 404 from API (<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
Error code: 404
Message: Not Found.
Error code explanation: 404 - Nothing matches the given URI.
) `
API settings:
Enable the openai setting as well. That should enable port 5001 which is what you use.
https://github.com/paul-gauthier/aider/assets/4990091/8e0918be-1a83-4b3f-9ee0-07c757b6a766
@jdvaugha @lvalics
Here is the video and you can pass this flag it in terminal when you start the text-generation-webui:
python server.py --extensions openai
the endpoint for local use:
http://127.0.0.1/v1 **or** http://localhost:5001/v1
to use the endpoint from anywhere using text-generation-webui :
python server.py --extensions openai --share
this will create a temporary share link and you can access the web UI and OpenAI endpoint from anywhere
To get better results use better GPU ⚡💵
Thank you. I will do today a server on runpod and see if I can get that into, to not use my local machine. It will be faster I hope.
opeai-w-aider.mp4 @jdvaugha @lvalics
Here is the video and you can pass this flag it in terminal when you start the text-generation-webui:
python server.py --extensions openai
the endpoint for local use:http://127.0.0.1/v1 **or** http://localhost:5001/v1/models
to use the endpoint from anywhere using text-generation-webui :
python server.py --extensions openai --share
this will create a temporary share link and you can access the web UI and OpenAI endpoint from anywhereTo get better results use better GPU ⚡💵
Not all Heros wear capes. ❤️🔥💯🙏
It is working via Runpod, needed to change in some places 127.0.0.1 to 0.0.0.0 but is working fast .. now Ineed to test with my local projects.
tested more. I can ask to create a tetris game and working, but if I add an existing code and ask to do something, I get no response.
man, this is making me excited!
The text-generation-webui route is not working for me with ExLlamaV2, garbage (unprintable characters) are produced by aider. Seems to work with V1.
EDIT: This seems to be model specific, I've switched out WizardCoder-34B GPTQ (4) for CodeLlama-34B-instruct-4.0bpw-h6-exl2 and now it works fine. But it's still really strange, in either case my curl calls to the OpenAI API layer of text-generation-webui work fine /EDIT
I've run into "The chat session is larger than the context window!" a few times now when playing around too. Is there any way to fix that? WizardCoder-34B doesn't appear to be smart enough to produce the edit blocks in diff mode when instructed to, so I guess I'll run into that a lot?
EDIT: This is due to text-generation-webui enforcing a 2048 token limit that takes some config changes to override /EDIT
@Chainfire There is a contributor, called BigArt which wrote a homemade server (fastapi, uvicorn) which loads a model with ExLlama (see discussion on the Discord thread). He is using a hack to handle out of context errors, by pruning the first few lines when approaching the context limit.
However, I did not manage to make the API working on Ubuntu, I keep getting : Added diamond.py to the chat. Invalid response object from API: '{"detail":"Not Found"}' (HTTP response code was 404)
types of errors from aider.
Note sure why ? Maybe there are different conventions w.r.t a windows setup (I know BigArt is using Windows OS).
If someone has more information on that, I am highly interested.
@ystoll I saw BigArt's server code, yeah, but I haven't run it so far. I've continued my journey into messing with text-generation-webui
. I've changed a bunch of settings (and even code!) and now it actually somewhat works.
Seeing my best results so far from manual testing (system prompts now work correctly, context size is correct, etc), but it's still not good enough for Aider use. Currently in the process of switching out CodeLlama-34B
for airoboros-c34b-2.1
which should be a better fit for exact instruction following, but no download is currently available in a format that works for me so I'm doing the conversion/quantizing myself which is a trip and will be a while.
If I do get that to work I'll have to figure out how to get the benchmarks running. I tried it yesterday with CodeLlama-34B but couldn't get the benchmark image to start, let alone produce results.
@Chainfire On my side, I managed to start the benchmark but keep getting several out of context errors.
I load the model via a JSON payload which I request to the text-generation-webui
API (see for instance api
I will try your hack and I'll give you a feedback on it. Thank you!)
@ystoll are you talking to text-generation-webui
manually?
~/llama-env/bin/python server.py --list --api --extensions openai --trust-remote-code
aider --openai-api-base=http://127.0.0.1:5001/v1 --openai-api-key dummy --model gpt-3.5-turbo-16k
Doesn't the benchmark provide the same options via environment?
EDIT: note that hack is only to improve the prompts. You can change context window settings and such in a local config file and load it with --settings
parameter to server.py. I'll write all this up once (if) I get it working well enough to bother.
This issue is a catch all for questions about using aider with other or local LLMs. The text below is taken from the FAQ.
Aider provides experimental support for LLMs other than OpenAI's GPT-3.5 and GPT-4. The support is currently only experimental for two reasons:
Numerous users have done experiments with numerous models. None of these experiments have yet identified other models that look like they are capable of working well with aider.
Once we see signs that a particular model is capable of code editing, it would be reasonable for aider to attempt to officially support such a model. Until then, aider will simply maintain experimental support for using alternative models.
More information
For more information on connecting to other models, local models and Azure models please see the FAQ.
There are ongoing discussions about LLM integrations in the aider discord.
Here are some GitHub issues which may contain relevant information.