No more extra pay for chatgpt please

jordanshow10000 commented 1 year ago

Hi, just wondering with the new open sources coming out like llama2 or any other at that level i hugginface why are we still using API from openai???... If i can operate at the same level as GT4 using an open source why are we still using openain? ... Counting ofcourse im having them download locally as well.

paul-gauthier commented 1 year ago

Thanks for trying aider!

I have not yet seen evidence that any other model can code as well as GPT-4. Aider has experimental support for hooking up to local models. This FAQ entry has more information:

https://aider.chat/docs/faq.html#can-i-use-aider-with-other-llms-local-llms-etc

Ichigo3766 commented 1 year ago

We have a model like wizardcoder-guanaco which is a great coding model. Its able to understand the context and create code very well and has 8k ctx. I have been using it and had very great success where lot of the code works on first try. There is even a code completion extention for vscode for this model

apcameron commented 1 year ago

Here is a Proof of Concept for you to test using a model hosted locally or on Google colab As we are going to patch the aider code clone a new copy to a folder on your computer. Apply the Patch Below to set the max_tokens for the context window.

diff --git a/aider/coders/base_coder.py b/aider/coders/base_coder.py
index 35cd6e2..7567bd7 100755
--- a/aider/coders/base_coder.py
+++ b/aider/coders/base_coder.py
@@ -645,6 +645,7 @@ class Coder:
         kwargs = dict(
             model=model,
             messages=messages,
+           max_tokens=4096,
             temperature=0,
             stream=self.stream,
         )

Then install it with python -m pip install -e .

Next create a file called llama_2_13b_chat_ggml.ipynb and place the contents below in the file

{
  "cells": [
    {
      "cell_type": "markdown",
      "source": [],
      "metadata": {
        "id": "8CMMJAZkLGLS"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "VCFOzsQSHbjM"
      },
      "outputs": [],
      "source": [
        "%cd /content\n",
        "!apt-get -y install -qq aria2 git-lfs\n",
        "\n",
        "!git clone -b v1.8 https://github.com/camenduru/text-generation-webui\n",
        "%cd /content/text-generation-webui\n",
        "!pip install -r requirements.txt\n",
        "!pip install sentence_transformers openai tiktoken flask_cloudflared\n",
        "!pip uninstall --yes llama-cpp-python\n",
        "!CMAKE_ARGS=\"-DLLAMA_CUBLAS=on\" FORCE_CMAKE=1 pip install llama-cpp-python\n",
        "\n",
        "\n",
        "!aria2c --console-log-level=error -c -x 16 -s 16 -k 1M https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q5_1.bin -d /content/text-generation-webui/models/TheBloke_Llama-2-13B-chat-GGML/ -o  llama-2-13b-chat.ggmlv3.q5_1.bin\n",
        "\n",
        "%cd /content/text-generation-webui\n",
        "!python server.py --share --chat --n-gpu-layers 40 --loader llama.cpp --model /content/text-generation-webui/models/TheBloke_Llama-2-13B-chat-GGML --disk --n_ctx 4096  --api --trust-remote-code --extensions openai --verbose"
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "provenance": []
    },
    "gpuClass": "standard",
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}

Next connect to google colab https://colab.research.google.com/ upload the file llama_2_13b_chat_ggml.ipynb and run it

Make sure to follow the progress by scrolling down. After about 5 minutes you will see a URL Similar to OPENAI_API_BASE=https://fs-authors-studying-carlo.trycloudflare.com/v1

You need to use that for aider when you run it.

make sure you are in a git enabled folder and run aider with the following command (Make sure to update the --openai-api-base=https://fs-authors-studying-carlo.trycloudflare.com/v1 with the line provided by the model in Colab) aider --openai-api-base=https://fs-authors-studying-carlo.trycloudflare.com/v1 --openai-api-key=dummy --model=gpt-3.5-turbo --edit-format whole -v calculator.c

Once you get to the

calculator.c prompt enter the following as a test. Write the source code for a simple calculator in standard C with the standard four functions. Request each input on a separate line . Request the Operator on a separate line. Include logic to prevent division by zero. Make sure that the code is fully functional and that it will compile and run successfully and that it has all the code needed.

The result after a few minutes should be an updated calculator.c that you can compile and test.

NOTE: as Aider was not written for this model it may not always work as expected but this test case works just fine. These steps can be adapted to test other models as well.

MAKE SURE TO DISCONNECT AND DELETE THE RUNTIME ON GOOGLE COLAB ONCE DONE

funkytaco commented 1 year ago

Thanks, this seems to work. A bit slower than OpenAI but its free, so.

What do you mean by this:

MAKE SURE TO DISCONNECT AND DELETE THE RUNTIME ON GOOGLE COLAB ONCE DONE

funkytaco commented 1 year ago

Hmm, it didn't actually write the code, but did print a diff, so it "kinda worked".


This code should now be fully functional and compilable in standard C. Let me know if you have any      
questions or need further clarification!                                                                
The chat session is larger than the context window!

Approximate context window usage, in tokens:

     271 system messages 
      88 chat history    use /clear to clear
       7 calculator.c    use /drop to drop from chat
========
     366 tokens total
   3,730 tokens remaining in context window
   4,096 tokens max context window size

To reduce token usage:
 - Use /drop to remove unneeded files from the chat session.
 - Use /clear to clear chat history.

funkytaco commented 1 year ago

Ah, I was pointing at the system aider. I had to put this in my git aider-llama2 clone parent directory and I called it free_aider and ran it as a executable called ./free_aider

e.g.: ./free_aider --openai-api-base=https://fs-authors-studying-carlo.trycloudflare.com/v1 --openai-api-key=dummy --model=gpt-3.5-turbo --edit-format whole -v calculator.c

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import re
import sys
sys.path.insert(0, './aider/main.py')
from aider.main import main
if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(main())

funkytaco commented 1 year ago

@apcameron check out the test project. Did you have this problem with file names? https://github.com/funkytaco/react-test-aider-LLaMa2

apcameron commented 1 year ago

Thanks, this seems to work. A bit slower than OpenAI but its free, so.

What do you mean by this:

MAKE SURE TO DISCONNECT AND DELETE THE RUNTIME ON GOOGLE COLAB ONCE DONE

This is just a reminder to delete the runtme and disconnect from google colab as they only give you a limited time to use each day. If you leave it may potentially use up all your time for the day

apcameron commented 1 year ago

@apcameron check out the test project. Did you have this problem with file names? https://github.com/funkytaco/react-test-aider-LLaMa2

Yes it does not always get the filenames correct but the calculator example always works correctly for me.

The aider code may need updating to be compatible with LLama2 or we may need to find a better model

apcameron commented 1 year ago

@paul-gauthier Have you had a chance to try this?

joshuavial commented 1 year ago

I've done a bunch of testing of aider against the following models - it wasn't too hard to get them running (using textgen-webui) and 'the hax' from @tmm1

https://huggingface.co/TheBloke/Llama-2-7b-Chat-GPTQ https://huggingface.co/TheBloke/Llama-2-13B-GPTQ https://huggingface.co/TheBloke/WizardCoder-15B-1.0-GPTQ https://huggingface.co/TheBloke/llama2-7b-chat-codeCherryPop-qLoRA-GPTQ

Short version is they were all too underpowered to be useful all but the 15b wizard models would all get confused on the file names and consistently put stuff in /path/to/hello.py

I got an ok hello world out of wizard but then instructions to modify it would generate new files instead of edits to the old file (it wouldn't understand the instructions to only edit the previous file). I quickly got beyond its ability with instructions like saying hello to the first arg passed (which would error if no arg passed in) and pasting the error message in to chat would just get it to acknowledge there was a problem, congratulate me for finding it and make no effort to actually fix it.

My impression at this stage is you want to test against at least a 30b plus model (downloading a few now) to get one which can understand the file instructions. From what I've seen so far, I don't think tweaking with the prompts will get much traction on the smaller models.

apcameron commented 1 year ago

@joshuavial Perhaps you can try this one [huggingface.co/TheBloke/WizardCoder-Guanaco-15B-V1.1]https://huggingface.co/TheBloke/WizardCoder-Guanaco-15B-V1.1-GPTQ)

joshuavial commented 1 year ago

I'll add it to the list - I'm currently planning on TheBloke/starcoderplus-GPTQ and TheBloke/FreeWilly2-GPTQ

I'm also planning on setting up a runpod VPS to test out the 70b llama-2 at some stage.

apcameron commented 1 year ago

Its a pity that Petals does not have an Openai api yet See https://chat.petals.dev/

I opened request https://github.com/petals-infra/chat.petals.dev/issues/20

paul-gauthier commented 1 year ago

FYI, we just added an #llm-integrations channel on the discord, as a place to discuss using aider with alternative or local LLMs.

https://discord.gg/X9Sq56tsaR

funkytaco commented 1 year ago

@apcameron check out the test project. Did you have this problem with file names? https://github.com/funkytaco/react-test-aider-LLaMa2

Yes it does not always get the filenames correct but the calculator example always works correctly for me.

The aider code may need updating to be compatible with LLama2 or we may need to find a better model

Hmm, well an update.

Basic hack: I just typed this as my first line: RULE: Ask for filename before writing any files. It seems to use "path/to/filename" but that's better than a sentence for the filename ("path/to/" become literal directories, so it doesn't realize this is an example path, I guess).

I removed that flag to print the SYSTEM text. Maybe I'll take another look when I'm not tired.

paul-gauthier commented 1 year ago

Closing this for now. See #172 for more info.

sciencehas commented 1 year ago

not all of us use discord

funkytaco commented 1 year ago

not all of us use discord

Not everyone used IRC in the 90's, either. Those people learned another way.

sciencehas commented 1 year ago

LOL 90S? we used ICQ. And before that nothing. We programmed on commodore 64s or ICON's with Watcom and Apple II or IIgs to everything in between to where we are now I can adapt Im just sick of switching back forth and having my personal info on a tonne of sites with new potential for creepy peeps while I need to ask questions. And when Discord first appeared I saw it on dark web as major form communication other than protonmail or scary chatrooms I would never go into as I am not stupid and obv not born yesterday lol. but it seems ok so Im trying it.

sciencehas commented 1 year ago

i wonder has anyone tried ctransformers AutoTokenizer, AutoModelForCausalLM model "bigcode/starcoderplus"

sciencehas commented 1 year ago

description sounds ok

sciencehas commented 1 year ago

maybe not, but im going to try a few report back - am I just old or does anyone else feel guilty consulting GPT 4 on fine details of replacing itself

paul-gauthier / aider

No more extra pay for chatgpt please #138