Open cfregly opened 1 year ago
at Proxy.textGeneration (file:///home/ubuntu/chat/node_modules/@huggingface/inference/dist/index.mjs:460:11)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Module.generateFromDefaultEndpoint (/home/ubuntu/chat/src/lib/server/generateFromDefaultEndpoint.ts:22:28)
at async POST (/home/ubuntu/chat/src/routes/conversation/[id]/summarize/+server.ts:30:26)
at async Module.render_endpoint (/home/ubuntu/chat/node_modules/@sveltejs/kit/src/runtime/server/endpoint.js:47:20)
at async resolve (/home/ubuntu/chat/node_modules/@sveltejs/kit/src/runtime/server/respond.js:388:17)
at async Object.handle (/home/ubuntu/chat/src/hooks.server.ts:66:20)
at async Module.respond (/home/ubuntu/chat/node_modules/@sveltejs/kit/src/runtime/server/respond.js:259:20)
at async file:///home/ubuntu/chat/node_modules/@sveltejs/kit/src/exports/vite/dev/index.js:506:22```
Hey! Thanks for filing a report. I'd like to look into it but I'm going to need a few more details.
SERPAPI_KEY
env variable in your .env.local ? MODELS
env variable ?Are you using a custom MODELS env variable ?
yes:
MODELS=`[
{
"endpoints": [
{"url": "http://127.0.0.1:8080/generate_stream", "weight": 100}
],
"name": "...",
"userMessageToken": "<|prompter|>",
"assistantMessageToken": "<|assistant|>",
"messageEndToken": "</s>",
"preprompt": "Below are a series of dialogues between various people and an AI assistant. The AI tries to be helpful, polite, honest, sophisticated, emotionally aware, and humble-but-knowledgeable. The assistant is happy to help with almost anything, and will do its best to understand exactly what is needed. It also tries to avoid giving false or misleading information, and it caveats when it isn't entirely sure about the right answer. That said, the assistant is practical and really does its best, and doesn't let caution get too much in the way of being useful.\n-----\n",
"parameters": {
"temperature": 0.9,
"top_p": 0.95,
"repetition_penalty": 1.2,
"top_k": 50,
"truncate": 1000,
"max_new_tokens": 1024
}
}
...
If so what models are you using ? LLaMA, Open Assist, Falcon
What DB are you using ?
MONGODB_DB_NAME=demo
MONGODB_URL=mongodb://127.0.0.1:27017/
MONGODB_DIRECT_CONNECTION=false
Does chat-ui work without the websearch ?
yes, but strangely, i also see the Invalid inference output
error, but I still get a valid response in the UI. When i enable web search, I see the error in the UI. Hmm.
InferenceOutputError: Invalid inference output: Expected Array<{generated_text: string}>. Use the 'request' method with the same parameters to do a custom call with no type checking.
at Proxy.textGeneration (file:///home/ubuntu/chat/node_modules/@huggingface/inference/dist/index.mjs:460:11)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Module.generateFromDefaultEndpoint (/home/ubuntu/chat/src/lib/server/generateFromDefaultEndpoint.ts:22:28)
at async POST (/home/ubuntu/chat/src/routes/conversation/[id]/summarize/+server.ts:30:26)
at async Module.render_endpoint (/home/ubuntu/chat/node_modules/@sveltejs/kit/src/runtime/server/endpoint.js:47:20)
at async resolve (/home/ubuntu/chat/node_modules/@sveltejs/kit/src/runtime/server/respond.js:388:17)
at async Object.handle (/home/ubuntu/chat/src/hooks.server.ts:66:20)
at async Module.respond (/home/ubuntu/chat/node_modules/@sveltejs/kit/src/runtime/server/respond.js:259:20)
at async file:///home/ubuntu/chat/node_modules/@sveltejs/kit/src/exports/vite/dev/index.js:506:22
docker run --gpus 4 --shm-size 1g -p 8080:80 -v $PWD/data:/data ghcr.io/huggingface/text-generation-inference:latest --model-id TheBloke/OpenAssistant-SFT-7-Llama-30B-HF --num-shard 4 --quantize bitsandbytes
I think the error above is another issue i'm having with /summarize/server.ts
- my convo titles are all "Untitled" - probably because summarization isn't working for whatever reason. I'll create an issue for that separately.
Any hints on how i can debug the Web Search
issue? or is it getting stuck on the summarize/ issue above?
The web search might be broken because it catches any error from the inference endpoint while simple answers don't. Which is why you then get answers to questions with websearch off plus a console error... (I'm just guessing here, but seems likely)
So the issue underpinning all this seems to be the Invalid inference output
error you get from running your local models with text-generation-inference
.
Thanks for the super detailed feedback, I'll have a deeper look at this
any update on this or https://github.com/huggingface/chat-ui/issues/278 ?
I have the same problem.
Did you setup your own SERPAPI_KEY env variable in your .env.local?
Yes. Also tried Serper.
Are you using a custom MODELS env variable?
If so what models are you using?
Tried both, same result
What DB are you using?
MONGODB_URL=mongodb://localhost:27017
Does chat-ui work without the websearch?
Text generation works always both in search mode and normal mode, no error message in web UI when using normal mode, but an error always pops up in the console.
InferenceOutputError: Invalid inference output: Expected Array<{generated_text: string}>. Use the 'request' method with the same parameters to do a custom call with no type checking. at Proxy.textGeneration (file:///D:/Ablage/Projekte/Experimente/AI/huggingchat/node_modules/@huggingface/inference/dist/index.mjs:460:11) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async Module.generateFromDefaultEndpoint (/src/lib/server/generateFromDefaultEndpoint.ts:22:28) at async POST (/src/routes/conversation/[id]/summarize/+server.ts:30:26) at async Module.render_endpoint (/node_modules/@sveltejs/kit/src/runtime/server/endpoint.js:47:20) at async resolve (/node_modules/@sveltejs/kit/src/runtime/server/respond.js:388:17) at async Object.handle (/src/hooks.server.ts:66:20) at async Module.respond (/node_modules/@sveltejs/kit/src/runtime/server/respond.js:259:20) at async file:///D:/Ablage/Projekte/Experimente/AI/huggingchat/node_modules/@sveltejs/kit/src/exports/vite/dev/index.js:506:22
Temp solution:
Say you have text-generation-inference running on http://1.1.1.1:8080.
in src/lib/server/generateFromDefaultEndpoint.ts, change
{
model: endpoint.url,
inputs: prompt,
parameters: newParameters,
}
to
{
model: `http://1.1.1.1:8080`,
inputs: prompt,
parameters: newParameters,
}
It seems to fix the problem, if you omit the /generate_stream
in the url in the definition of the model in the .env file.
This means that the configuration should e.g. look like this:
MODELS=`[
{
"endpoints": [
{"url": "http://127.0.0.1:8080"}
],
"name": "...",
...
}
...
]`
experiencing the same issue here when trying to connect to a custom inference model running in a separate docker on port 8000
It seems to fix the problem, if you omit the
/generate_stream
in the url in the definition of the model in the .env file. This means that the configuration should e.g. look like this:MODELS=`[ { "endpoints": [ {"url": "http://127.0.0.1:8080"} ], "name": "...", ... } ... ]`
This solved the problem for me - it fixed the InferenceOutputError and the web search is working.