use local model - Githubissues

ValValu commented 1 year ago

would be nice to use https://huggingface.co/Phind/Phind-CodeLlama-34B-v1 locally

shnee commented 12 months ago

+1

I was looking to use something like https://github.com/getumbrel/llama-gpt#openai-compatible-api which is supposed to be compatible with the OpenAI API.

Maybe allowing the OpenAI URL to be configurable would be enough?

EDIT: Looks like the URL is allready configurable! rtfm

Custom OpenAI API host with the configuration option api_host_cmd or environment variable called $OPENAI_API_HOST. It's useful if you can't access OpenAI directly

rolandtannous commented 11 months ago

@shnee did this work out with a local model?

shnee commented 11 months ago

I haven't given it try yet, but I plan to tinker with it soon. I promise to report back.

shnee commented 11 months ago

I gave it a try today with limited success. I had to change the protocol in api.lua to http because I'm connecting to a selfhosted instance. I changed the model in that file as well.

I was able to get responses from the local model but they were all strange...

What is the capitol of the United States? Response: The capital city of the United States is Washington D. nobody knows.

What is the capitol of France? Response: The capital city of France is Paris. Hinweis: Das Hauptstadt der Frankreich ist Paris.

I copied the requests that this plugin is sending:

curl -X POST --silent --show-error --no-buffer \
    http://192.168.1.204:3001/v1/chat/completions \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer <snip> \
    -d '{"model":"llama-2-7b-chat.bin",
           "messages":[{"role":"user","content":"What is the capitol of Ohio?"}],
           "n":1,"top_p":1,"presence_penalty":0,"max_tokens":300,"temperature":0,"frequency_penalty":0,"stream":true}'

I was getting the same results. However, once I started to send a "prompt" message, the responses were more normal:

"messages":[{"role":"system","content":"You are a helpful assistant."},{"role":"user","content":"What is the capitol of France?"}]

then I started to get back normal results:

The capital city of France is Paris.

There also appears to be an issue with saving the chat responses. The plugin seems to be only saving the requests. I didn't have time to dig into that yet. If I find some more time I'll try to dig further.

darkacorn commented 11 months ago

https://github.com/GenAiWizards/ChatGPT.nvim

so far so good .. http/https is now part of the OPENAI_API_HOST env var

or falls back to localhost:5001

promps need to be adjusted but i do as i go

teto commented 9 months ago

I wanted to try out a local via https://github.com/getumbrel/llama-gpt#openai-compatible-api , so I set api_host_cmd = "echo -n '0.0.0.0:3000'" but that triggers a curl: (3) URL rejected: Port number was not a decimal number between 0 and 65535. Not sure if that's a server or client error. Is there a way to increase debug log levels ? could be helpful to have a health.lua to run :checkhealth with .

eitamal commented 9 months ago

You might want to take a look into https://github.com/mudler/LocalAI while you're at it. I haven't tried it myself, but it's meant to be a drop-in replacement for OpenAI's API with support for many models, which I believe will even support the Phind-CodeLlama-34B-v2 via TheBloke's GGUF port.

sg1fan commented 9 months ago

I got this working locally by using the config optionapi_host_cmd = 'echo -n http://localhost:5000' while running https://github.com/oobabooga/text-generation-webui, but I'd imagine it working for any server supporting the OpenAI API.

jackMort / ChatGPT.nvim

use local model #276