paul-gauthier commented 11 months ago

This issue is a catch all for questions about using aider with other or local LLMs. The text below is taken from the FAQ.

Aider provides experimental support for LLMs other than OpenAI's GPT-3.5 and GPT-4. The support is currently only experimental for two reasons:

GPT-3.5 is just barely capable of editing code to provide aider's interactive "pair programming" style workflow. None of the other models seem to be as capable as GPT-3.5 yet.
Just "hooking up" aider to a new model by connecting to its API is almost certainly not enough to get it working in a useful way. Getting aider working well with GPT-3.5 and GPT-4 was a significant undertaking, involving specific code editing prompts and backends for each model and extensive benchmarking. Officially supporting each new LLM will probably require a similar effort to tailor the prompts and editing backends.

Numerous users have done experiments with numerous models. None of these experiments have yet identified other models that look like they are capable of working well with aider.

Once we see signs that a particular model is capable of code editing, it would be reasonable for aider to attempt to officially support such a model. Until then, aider will simply maintain experimental support for using alternative models.

More information

For more information on connecting to other models, local models and Azure models please see the FAQ.

There are ongoing discussions about LLM integrations in the aider discord.

Here are some GitHub issues which may contain relevant information.

Explosion-Scratch commented 11 months ago

Why not have a unified API that you could provide then a plugin system to integrate other LLMS, then you can just provide gpt-3.5 and gpt-4 plugins officially yourself.

paul-gauthier commented 11 months ago

Per the FAQ and info above in the issue, there are already some useful hooks for connecting to other LLMs. A number of users have been using them to experiment with local LLMs. Most reports haven't been very enthusiastic about how they perform with aider compared to GPT-3.5/4.

Aider does have a modular system for building different "coder" backends, allowing customization of prompts and edit formats.

So all the raw materials seem to be available to get aider working with specific new models. I'm always happy to advise as best I can. And if you see evidence that a particular model has the potential to work well with aider, that would exciting news.

aldoyh commented 11 months ago

Great work, has anyone confirmed a successful usage with other local LLMs? It's not about cheapness, we still don't have access to ChatGPT API nor being able to pay for any alternatives.

jmdevall commented 11 months ago

hello! great news! there is another contender in the ecosystem of open source llm-coders

Yesterday I tested quantized version https://huggingface.co/TheBloke/NewHope-GGML (The bloke also released GPTQ version). Running in local using ooba textgen and activating opeanai extension. What I read is that is a llama2 model finetuned similar to wizardcoder

Seems to be better than wizardcoder but still needs effort to adjust prompts to run and be usable. Upsss... The group that originally released the model had to remove it because they realized that some data with which the quality was evaluated had slipped into the data with which they trained the model, so that they were giving comparative results that they weren't real. Even so, the quantized model of the_block is still there and can be downloaded and tested.

tmm1 commented 11 months ago

The NewHope model was retracted because it was contaminated with test data causing overfit.

https://twitter.com/mathemagic1an/status/1686814347287486464?s=46&t=hIokEbug9Pr72tQFuXVULA

So far none of the models come close to gpt and cannot follow instructions well enough to work with aider.

joshuavial commented 11 months ago

@aldoyh

170 might be of interest - are you able to access the openrouter apis?

strangelearning commented 11 months ago

Hey @paul-gauthier, while this question isn’t directly related to using other LLMs, I was wondering your advice for where to poke around to embed additional context into the prompt.

My friend and I are putting together a context embedding for some relevant up to date developer documentation and would love to try aider in conjunction with that context.

Thanks for this tool!

aldoyh commented 11 months ago

To reply I had to research that and it's my answer. No, didn't know about it and going to check it out.. but even if I deployed a LocalAI, isn't there a way to make Aider look at it?

apcameron commented 11 months ago

@aldoyh Have a look at my comments in https://github.com/paul-gauthier/aider/issues/138

JimmysDev commented 10 months ago

Appreciate your work on this project. This represents one of the biggest missing pieces between LLM and real code writing utility, so well done.

I read the bits about how hard it is to add new models, I just want to request you take a look at Claude 2. v2 added some big improvements on the code side, and while it is still dumber than GPT-4, it has a more recent knowledge cutoff. Plus massive context.

I understand you are working around context limitations with ctags, but it could be interesting to see if there is an advantage to being able to load the entire project in context with Claude. For example, it may be better at answering high level questions, or writing features that are described in more abstract terms. But regardless, I think that Claude is hot on the heels of GPT-4, and if the reporting on it being a 52B model is true then it is already significantly smarter (pound for pound).

Just my 2c anyway

paul-gauthier commented 10 months ago

I agree that the Claude models sound like they are the most likely to be capable of working well with aider. I have been waiting for an api key for months unfortunately. My impression is that it is very difficult to get a key, which limits the benefits of integrating Claude into aider. Not many folks could use it.

linvest21 commented 10 months ago

Paul, perhaps try openrouter, which seem sidestep the key issue, and give access to claude directly..

On Mon, Aug 14, 2023, 6:59 AM paul-gauthier @.***> wrote:

I agree that the Claude models sound like they are the most likely to be capable of working well with aider. I have been waiting for an api key for months unfortunately. My impression is that it is very difficult to get a key, which limits the benefits of integrating Claude into aider. Not many folks could use it.

— Reply to this email directly, view it on GitHub https://github.com/paul-gauthier/aider/issues/172#issuecomment-1677112183, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAW75LIGJJWEWORRNCA4TY3XVIAI5ANCNFSM6AAAAAA3BOLHYA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

paul-gauthier commented 10 months ago

Paul, perhaps try openrouter, which seem sidestep the key issue, and give access to claude directly..

Yes, I am aware of openrouter. But that is a confusing extra layer to explain to users. Most users won't have direct Claude api access. And I won't be able to test aider directly against the Claude api. It's all sort of workable, but far from ideal.

Explosion-Scratch commented 10 months ago

Paul, perhaps try openrouter, which seem sidestep the key issue, and give access to claude directly..

Yes, I am aware of openrouter. But that is a confusing extra layer to explain to users. Most users won't have direct Claude api access. And I won't be able to test aider directly against the Claude api. It's all sort of workable, but far from ideal.

I like the idea of an easily extendable system, e.g a flag (--bot llama5) that exports a class with this structure:

export default class LLaMA5 {
   requirements = [
      {
         id: "apiKey",
         name: "API Key",
         type: "string",
         required: true,
      }
   ];
   constructor({apiKey}){

   }
   createConversation() --→ ConvoParams
   sendMessage(message, {conversation: ConvoParams, progress, done})
   // Other optional methods such as deleteConversation, deleteMessage, editMessage, retryMessage, etc
}

This would be easily inspectable by Aider to check if this bot supports retrying, editing, etc as well as supporting the required parameters.

paul-gauthier commented 10 months ago

https://xkcd.com/927/

Egalitaristen commented 10 months ago

When it comes to the coding capabilities of local LLMs, I believe that the HumanEval (pass@1) is the most important metric.

The leaderboard lists "Starcoder-16b" as the best open model with a score of 0.336 compared to GPT-4 score of 0.67.

But we also have 2 GPT-3.5 models at 0.48/0.46. But here's the thing, There's also the "WizardLM-70B-V1.0" which no one has added to the leaderboards yet but it actually has a higher score than GPT-3.5 at 0.506.

I don't have a machine powerful enough to run it but I think that with minor tweaking it should perform as well as GPT-3.5

All this being said, I'm not a dev and I haven't tested any of this and honestly don't fully understand all the steps that autonomous agents like Aider take to get it to work.

Just though that I'd mention it in case it's useful to someone.

And Paul, great work with everything here. Really cool

Edit

There's also the WizardCoder-15B-V1.0 with a score of 0.573 which was what I originally came to inform about but somehow forgot along the way while checking sources.

J-DTurner commented 10 months ago

I agree that the Claude models sound like they are the most likely to be capable of working well with aider. I have been waiting for an api key for months unfortunately. My impression is that it is very difficult to get a key, which limits the benefits of integrating Claude into aider. Not many folks could use it.

Hey Paul,

I'd be happy to lend you my API key to use for testing. There's a max of 1 call at a time, so if you can deal with that limitation - all good!

paul-gauthier commented 10 months ago

I'd be happy to lend you my API key to use for testing.

Thanks @JamesSKR. I have a loaner API key already. But again, so few people have Claude API access that it's not going to be very impactful to get aider working with Claude. Almost no one could use it. I definitely want to experiment with Claude, but it's not super high priority right now for that reason.

ShadovvBeast commented 10 months ago

Adding support for the recently released Code llama (perhaps using cria?) would be very interesting imo, what do you think @paul-gauthier?

samuelmukoti commented 10 months ago

Hi Paul, thank you for such a great project. Love what you've done so far. I was also wondering if you've tested the PalM API from Google, just wondering if its any good?

joshuavial commented 10 months ago

@samuelmukoti I tested the palm models a little while working on the openrouter integration - they were ok, but similar to lama needed a bit of coaxing to output responses in a format aider would understand.

sammcj commented 10 months ago

Just tested out the main branch with text-generation-webui's openAI API endpoint and it worked right away.

Here it is with Llamacoder:

SCR-20230825-orro

samuelmukoti commented 10 months ago

Wow that’s exciting.. hope you can conduct further tests and share how the performance is compared to gpt3.5

thanks for sharing

sammcj commented 10 months ago

I wonder if there's a way to get ctags working 🤔

paul-gauthier commented 10 months ago

@sammcj Have you tried asking it to edit code?

jdvaugha commented 10 months ago

@sammcj, how did you setup the API for the Llamacoder? I am interested in giving this a try. Thanks

strangelearning commented 10 months ago

ollama is by far the easiest way!!

On Fri, Aug 25, 2023 at 17:27 John Vaughan @.***> wrote:

@sammcj https://github.com/sammcj, how did you setup the API for the Llamacoder? I am interested in giving this a try. Thanks

— Reply to this email directly, view it on GitHub https://github.com/paul-gauthier/aider/issues/172#issuecomment-1693954446, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATHQYAGIQ3Z4C7XYOQJEZA3XXEKEPANCNFSM6AAAAAA3BOLHYA . You are receiving this because you commented.Message ID: @.***>

sammcj commented 10 months ago

@paul-gauthier Did you have a sample file and prompt that you’d like to provide to compare it to something you’d run against gpt3.5?? I can try it out and provide the results.

@jdvaugha I have a heap of different LLM tools on my home server, but the one I seem to use the most is https://github.com/oobabooga/text-generation-webui, however as mentioned Ollama is a very easy way to get started.

apcameron commented 10 months ago

@sammcj Try the tests in the Examples folder. Here is one of them https://github.com/paul-gauthier/aider/blob/main/examples/hello-world-flask.md

ryanpeach commented 10 months ago

FYI You can use llama.cpp to run local models with an openai compatible server with no-or-little code modifications

https://github.com/ggerganov/llama.cpp/discussions/795

I've yet to try but I'm excited to try this with llama code

Just set the OPENAI_API_BASE environment variable.

apcameron commented 10 months ago

Here is how I run it I have a script called server.sh to fire up the server

!/bin/sh

export HOST=0.0.0.0 &&python3 -m llama_cpp.server --n_threads 4 --model models/wizardcoder-python-34b-v1.0.Q2_K.gguf --n_ctx 16384

Then run aider from bash aider --openai-api-base=http://192.168.0.101:8000/v1 --openai-api-key=dummy --model=gpt-3.5-turbo --edit-format whole -v calculator.c --no-pretty --verbose

apcameron commented 10 months ago

You may want to change the model in the line above to --model=gpt-3.5-turbo-16k to take advantage of the larger context window

marvin-hansen commented 10 months ago

@apcameron

Can you try this with the new Code Llama model?

"The Code Llama models provide stable generations with up to 100,000 tokens of context. All models are trained on sequences of 16,000 tokens and show improvements on inputs with up to 100,000 tokens."

https://ai.meta.com/blog/code-llama-large-language-model-coding/

https://www.geeky-gadgets.com/how-to-install-code-llama-locally/

apcameron commented 10 months ago

@marvin-hansen I have tried it with https://huggingface.co/TheBloke/CodeLlama-34B-Instruct-GGUF but I found that the Wizardcoder was better for me. You need to try the different models and see which is best for your use case.

xb3sox commented 10 months ago

I wonder if there's a way to get ctags working 🤔

change the model to gpt-4 maybe

xb3sox commented 10 months ago

💪😃Got it working with oobabooga text web UI using --share link as the API replacement. Model: TheBloke_WizardCoder-Python-34B-V1.0-GPTQ

lvalics commented 10 months ago

Can you share a small video on how to?

jdvaugha commented 10 months ago

Hi @xb3sox, would you mind showing the settings you used for the API using oobabooga? I keep getting an error.

Error: `esponse_line raise error.APIError( openai.error.APIError: HTTP code 404 from API (<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

Error response

Error code: 404

Message: Not Found.

Error code explanation: 404 - Nothing matches the given URI.

) `

API settings:

apcameron commented 10 months ago

Enable the openai setting as well. That should enable port 5001 which is what you use.

xb3sox commented 10 months ago

https://github.com/paul-gauthier/aider/assets/4990091/8e0918be-1a83-4b3f-9ee0-07c757b6a766

@jdvaugha @lvalics

Here is the video and you can pass this flag it in terminal when you start the text-generation-webui: python server.py --extensions openai the endpoint for local use: http://127.0.0.1/v1 **or** http://localhost:5001/v1

to use the endpoint from anywhere using text-generation-webui : python server.py --extensions openai --share this will create a temporary share link and you can access the web UI and OpenAI endpoint from anywhere

To get better results use better GPU ⚡💵

lvalics commented 10 months ago

Thank you. I will do today a server on runpod and see if I can get that into, to not use my local machine. It will be faster I hope.

supermario-ai commented 10 months ago

opeai-w-aider.mp4 @jdvaugha @lvalics

Here is the video and you can pass this flag it in terminal when you start the text-generation-webui: python server.py --extensions openai the endpoint for local use: http://127.0.0.1/v1 **or** http://localhost:5001/v1/models

to use the endpoint from anywhere using text-generation-webui : python server.py --extensions openai --share this will create a temporary share link and you can access the web UI and OpenAI endpoint from anywhere

To get better results use better GPU ⚡💵

Not all Heros wear capes. ❤️‍🔥💯🙏

lvalics commented 10 months ago

It is working via Runpod, needed to change in some places 127.0.0.1 to 0.0.0.0 but is working fast .. now Ineed to test with my local projects.

lvalics commented 10 months ago

tested more. I can ask to create a tetris game and working, but if I add an existing code and ask to do something, I get no response.

r1di commented 10 months ago

man, this is making me excited!

Chainfire commented 9 months ago

The text-generation-webui route is not working for me with ExLlamaV2, garbage (unprintable characters) are produced by aider. Seems to work with V1.

EDIT: This seems to be model specific, I've switched out WizardCoder-34B GPTQ (4) for CodeLlama-34B-instruct-4.0bpw-h6-exl2 and now it works fine. But it's still really strange, in either case my curl calls to the OpenAI API layer of text-generation-webui work fine /EDIT

I've run into "The chat session is larger than the context window!" a few times now when playing around too. Is there any way to fix that? WizardCoder-34B doesn't appear to be smart enough to produce the edit blocks in diff mode when instructed to, so I guess I'll run into that a lot?

EDIT: This is due to text-generation-webui enforcing a 2048 token limit that takes some config changes to override /EDIT

ystoll commented 9 months ago

@Chainfire There is a contributor, called BigArt which wrote a homemade server (fastapi, uvicorn) which loads a model with ExLlama (see discussion on the Discord thread). He is using a hack to handle out of context errors, by pruning the first few lines when approaching the context limit.

However, I did not manage to make the API working on Ubuntu, I keep getting : Added diamond.py to the chat. Invalid response object from API: '{"detail":"Not Found"}' (HTTP response code was 404)

types of errors from aider.

Note sure why ? Maybe there are different conventions w.r.t a windows setup (I know BigArt is using Windows OS).

If someone has more information on that, I am highly interested.

Chainfire commented 9 months ago

@ystoll I saw BigArt's server code, yeah, but I haven't run it so far. I've continued my journey into messing with text-generation-webui. I've changed a bunch of settings (and even code!) and now it actually somewhat works.

Seeing my best results so far from manual testing (system prompts now work correctly, context size is correct, etc), but it's still not good enough for Aider use. Currently in the process of switching out CodeLlama-34B for airoboros-c34b-2.1 which should be a better fit for exact instruction following, but no download is currently available in a format that works for me so I'm doing the conversion/quantizing myself which is a trip and will be a while.

If I do get that to work I'll have to figure out how to get the benchmarks running. I tried it yesterday with CodeLlama-34B but couldn't get the benchmark image to start, let alone produce results.

ystoll commented 9 months ago

@Chainfire On my side, I managed to start the benchmark but keep getting several out of context errors. I load the model via a JSON payload which I request to the text-generation-webui API (see for instance api I will try your hack and I'll give you a feedback on it. Thank you!)

Chainfire commented 9 months ago

@ystoll are you talking to text-generation-webui manually?

~/llama-env/bin/python server.py --list --api --extensions openai --trust-remote-code

aider --openai-api-base=http://127.0.0.1:5001/v1 --openai-api-key dummy --model gpt-3.5-turbo-16k

Doesn't the benchmark provide the same options via environment?

EDIT: note that hack is only to improve the prompts. You can change context window settings and such in a local config file and load it with --settings parameter to server.py. I'll write all this up once (if) I get it working well enough to bother.

paul-gauthier / aider

Support for other LLMs, local LLMs, etc #172

More information

170 might be of interest - are you able to access the openrouter apis?

!/bin/sh

Error response