Closed tvraman closed 5 months ago
A gptel-backend
corresponds to a LLM service (local or remote), which can provide multiple models so the slot (gptel-backend-models some-backend)
is a list. In your example, it's the list '("zephyr")
.
This is why a separate variable, gptel-model
, is needed. The suggestion in the README for configuring Ollama is to set both explicitly:
(setq-default gptel-model "mistral:latest" ;Pick your default model
gptel-backend (gptel-make-ollama "Ollama" :host ...))
I can make it so gptel-model
is not limited to the OpenAI models in customize and you can use setopt
etc. Is this a sufficient solution, or do you have any ideas for better structuring the model configuration?
I'm assuming their goal might be what mine is:
Not have to go to textgen or ollama or llama.cpp to have to change the model like I currently do.
If not though I can create a separate issue for that.
Yeah I just tried with textgen backend:
(defvar gptel--textgen-webui-openai
(gptel-make-openai
"textgen-webui--openai"
:stream t
:protocol "http"
:host "localhost:5000"
:models '("LoneStriker_Nous-Hermes-2-Yi-34B-3.0bpw-h6-exl2"
"dolphin-2.1-mistral-7b.Q6_K.gguf"))
"GPTel backend for textgen webui openai server")
It just uses the model that is loaded.
The UI implies to me though that it will load the model I select.
I guess it's actually just a bit of an impedance mismatch with local vs hosted llms where locally many times you start things yourself in case you already have some server running.
I'm starting to see the value in using different models like say dolphin generally, llama-code-34b for coding, etc.
I'm not sure what a good solution would look like here or if that's something you feel is appropriate for gptel.
It just uses the model that is loaded.
The UI implies to me though that it will load the model I select.
I'm confused here. Are you talking about the transient menu UI where you have to select a backend+model every time you want to change the model?
Yes. My original thought was:
Thinking further, I can see problems with that:
So I can see why it wouldn't work for local llms... at least... I think it doesn't? I guess I'm asking for clarification.
Yes. My original thought was:
* add gptel--textgen-webui-openai * add list of models in it's model directory * the model I select in gptel transient menu gets loaded
Isn't this how it works right now?
If I don't explicitly go into text-generation-web-ui and load a model, I get an error in emacs from gptel:
Debugger entered--Lisp error: (wrong-type-argument integer-or-marker-p nil)
pulse-momentary-highlight-region(34 nil)
gptel-curl--stream-cleanup(#<process gptel-curl> "exited abnormally with code 18\n")
And in text-generation-webui I also see:
Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
return await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1106, in __call__
await super().__call__(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 122, in __call__
await self.middleware_stack(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 184, in __call__
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 162, in __call__
await self.app(scope, receive, _send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 83, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 79, in __call__
raise exc
File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 68, in __call__
await self.app(scope, receive, sender)
File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 20, in __call__
raise e
File "/usr/local/lib/python3.10/dist-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 718, in __call__
await route.handle(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 69, in app
await response(scope, receive, send)
File "/usr/local/lib/python3.10/dist-packages/sse_starlette/sse.py", line 233, in __call__
async with anyio.create_task_group() as task_group:
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 597, in __aexit__
raise exceptions[0]
File "/usr/local/lib/python3.10/dist-packages/sse_starlette/sse.py", line 236, in wrap
await func()
File "/usr/local/lib/python3.10/dist-packages/sse_starlette/sse.py", line 221, in stream_response
async for data in self.body_iterator:
File "/workspace/text-generation-webui/extensions/openai/script.py", line 127, in generator
for resp in response:
File "/workspace/text-generation-webui/extensions/openai/completions.py", line 521, in stream_chat_completions
for resp in chat_completions_common(body, is_legacy, stream=True):
File "/workspace/text-generation-webui/extensions/openai/completions.py", line 296, in chat_completions_common
prompt = generate_chat_prompt(user_input, generate_params)
File "/workspace/text-generation-webui/modules/chat.py", line 172, in generate_chat_prompt
while len(messages) > 0 and get_encoded_length(prompt) > max_length:
File "/workspace/text-generation-webui/modules/text_generation.py", line 160, in get_encoded_length
return len(encode(prompt)[0])
File "/workspace/text-generation-webui/modules/text_generation.py", line 119, in encode
raise ValueError('No tokenizer is loaded')
ValueError: No tokenizer is loaded
Maybe the textgeneration-web-ui open ai api server doesn't support this or has a bug?
Yeah, I guess it's a problem or lack of support on their end. I just tested with:
curl http://localhost:5000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "dolphin-2.7-mixtral-8x7b.Q5_0.gguf",
"messages": [{"role": "user", "content": "Say this is a test!"}],
"temperature": 0.7
}'
Got the same error as above.
Found this:
Best is to load the model using the Web UI. Once the model is loaded, the API will pick it up.
https://github.com/oobabooga/text-generation-webui/issues/4534#issuecomment-1814549672
The UI implies to me though that it will load the model I select.
I'm still confused about what the issue is. IIUC, choosing a model in gptel does not influence the model that the text-generation-webui actually uses. gptel can't ascertain the availability of the model at the endpoint specified in gptel-make-openai
, especially since every backend (Ollama, GPT4All, and now text-generation-webui) uses a different method to make a model available.
Perhaps gptel could test if the backend is "online", so to speak, by sending a test message and noting the status of the response. This is the most we can reasonably do from Emacs. Starting and running these various local LLM providers (Ollama etc) from Emacs is a whole another package's worth of details, and would need constant upkeep too as they're being actively developed.
This is a separate issue from @tvraman's concern, which is yet to be resolved. For further discussion of this, please open a thread in discussions instead.
@tvraman Joao Tavora added "model sanitization" to gptel in 7b19cdf1. It now checks if gptel-model
is one of the models specified in the backend, and if it isn't, sets it to the first valid model from the backend before sending the query.
BEGIN_SRC
(setq-default gptel-backend (gptel-make-ollama "Ollama" :host "localhost:11434" :models '("zephyr") :stream t))
END_SRC