huggingface / chat-ui

Open source codebase powering the HuggingChat app
https://huggingface.co/chat
Apache License 2.0
7.19k stars 1.04k forks source link

[web search] An error occurred with the web search "Invalid inference output: Expected Array<{generated_text: string}>. Use the 'request' method with the same parameters to do a custom call with no type checking." #274

Open cfregly opened 1 year ago

cfregly commented 1 year ago
image
cfregly commented 1 year ago

    at Proxy.textGeneration (file:///home/ubuntu/chat/node_modules/@huggingface/inference/dist/index.mjs:460:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Module.generateFromDefaultEndpoint (/home/ubuntu/chat/src/lib/server/generateFromDefaultEndpoint.ts:22:28)
    at async POST (/home/ubuntu/chat/src/routes/conversation/[id]/summarize/+server.ts:30:26)
    at async Module.render_endpoint (/home/ubuntu/chat/node_modules/@sveltejs/kit/src/runtime/server/endpoint.js:47:20)
    at async resolve (/home/ubuntu/chat/node_modules/@sveltejs/kit/src/runtime/server/respond.js:388:17)
    at async Object.handle (/home/ubuntu/chat/src/hooks.server.ts:66:20)
    at async Module.respond (/home/ubuntu/chat/node_modules/@sveltejs/kit/src/runtime/server/respond.js:259:20)
    at async file:///home/ubuntu/chat/node_modules/@sveltejs/kit/src/exports/vite/dev/index.js:506:22```
nsarrazin commented 1 year ago

Hey! Thanks for filing a report. I'd like to look into it but I'm going to need a few more details.

cfregly commented 1 year ago

Are you using a custom MODELS env variable ?

yes:

MODELS=`[
  {
    "endpoints": [
        {"url": "http://127.0.0.1:8080/generate_stream", "weight": 100}
    ],
    "name": "...",
    "userMessageToken": "<|prompter|>",
    "assistantMessageToken": "<|assistant|>",
    "messageEndToken": "</s>",
    "preprompt": "Below are a series of dialogues between various people and an AI assistant. The AI tries to be helpful, polite, honest, sophisticated, emotionally aware, and humble-but-knowledgeable. The assistant is happy to help with almost anything, and will do its best to understand exactly what is needed. It also tries to avoid giving false or misleading information, and it caveats when it isn't entirely sure about the right answer. That said, the assistant is practical and really does its best, and doesn't let caution get too much in the way of being useful.\n-----\n",
    "parameters": {
      "temperature": 0.9,
      "top_p": 0.95,
      "repetition_penalty": 1.2,
      "top_k": 50,
      "truncate": 1000,
      "max_new_tokens": 1024
    }
  }
...

If so what models are you using ? LLaMA, Open Assist, Falcon

What DB are you using ?

MONGODB_DB_NAME=demo
MONGODB_URL=mongodb://127.0.0.1:27017/
MONGODB_DIRECT_CONNECTION=false

Does chat-ui work without the websearch ?

yes, but strangely, i also see the Invalid inference output error, but I still get a valid response in the UI. When i enable web search, I see the error in the UI. Hmm.

InferenceOutputError: Invalid inference output: Expected Array<{generated_text: string}>. Use the 'request' method with the same parameters to do a custom call with no type checking.
    at Proxy.textGeneration (file:///home/ubuntu/chat/node_modules/@huggingface/inference/dist/index.mjs:460:11)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async Module.generateFromDefaultEndpoint (/home/ubuntu/chat/src/lib/server/generateFromDefaultEndpoint.ts:22:28)
    at async POST (/home/ubuntu/chat/src/routes/conversation/[id]/summarize/+server.ts:30:26)
    at async Module.render_endpoint (/home/ubuntu/chat/node_modules/@sveltejs/kit/src/runtime/server/endpoint.js:47:20)
    at async resolve (/home/ubuntu/chat/node_modules/@sveltejs/kit/src/runtime/server/respond.js:388:17)
    at async Object.handle (/home/ubuntu/chat/src/hooks.server.ts:66:20)
    at async Module.respond (/home/ubuntu/chat/node_modules/@sveltejs/kit/src/runtime/server/respond.js:259:20)
    at async file:///home/ubuntu/chat/node_modules/@sveltejs/kit/src/exports/vite/dev/index.js:506:22
cfregly commented 1 year ago
docker run --gpus 4 --shm-size 1g -p 8080:80 -v $PWD/data:/data ghcr.io/huggingface/text-generation-inference:latest --model-id TheBloke/OpenAssistant-SFT-7-Llama-30B-HF --num-shard 4 --quantize bitsandbytes
cfregly commented 1 year ago

I think the error above is another issue i'm having with /summarize/server.ts - my convo titles are all "Untitled" - probably because summarization isn't working for whatever reason. I'll create an issue for that separately.

Any hints on how i can debug the Web Search issue? or is it getting stuck on the summarize/ issue above?

nsarrazin commented 1 year ago

The web search might be broken because it catches any error from the inference endpoint while simple answers don't. Which is why you then get answers to questions with websearch off plus a console error... (I'm just guessing here, but seems likely)

So the issue underpinning all this seems to be the Invalid inference output error you get from running your local models with text-generation-inference.

Thanks for the super detailed feedback, I'll have a deeper look at this

cfregly commented 1 year ago

any update on this or https://github.com/huggingface/chat-ui/issues/278 ?

secondtruth commented 1 year ago

I have the same problem.

Did you setup your own SERPAPI_KEY env variable in your .env.local?

Yes. Also tried Serper.

Are you using a custom MODELS env variable?

Yes ``` MODELS=`[ { "name": "OpenAI GPT-3.5", "description": "OpenAI's second-best performing model (ChatGPT)", "websiteUrl": "https://openai.com", "endpoints": [{"url": "http://127.0.0.1:8000/generate_stream"}], "userMessageToken": "User: ", "assistantMessageToken": "Assistant: ", "messageEndToken": "\n", "preprompt": "You are a helpful assistant named secondChat.", "parameters": { "temperature": 0.9, "max_new_tokens": 500, "truncate": 500 } }, { "name": "OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5", "displayName": "OpenAssistant", "description": "A good alternative to ChatGPT", "websiteUrl": "https://open-assistant.io", "datasetName": "OpenAssistant/oasst1", "userMessageToken": "<|prompter|>", "assistantMessageToken": "<|assistant|>", "messageEndToken": "", "preprompt": "Below are a series of dialogues between a human and an AI assistant. The assistant is named \"secondChat\" (spelled exactly that way) and was developed by secondtruth. The AI tries to be helpful, polite, honest, sophisticated, emotionally aware, and humble-but-knowledgeable. The assistant is happy to help with almost anything, and will do its best to understand exactly what is needed. It also tries to avoid giving false or misleading information, and it caveats when it isn't entirely sure about the right answer. That said, the assistant is practical and really does its best, and doesn't let caution get too much in the way of being useful.\n-----\n", "promptExamples": [ { "title": "Write an email from bullet list", "prompt": "As a restaurant owner, write a professional email to the supplier to get these products every week: \n\n- Wine (x10)\n- Eggs (x24)\n- Bread (x12)" }, { "title": "Code a snake game", "prompt": "Code a basic snake game in python, give explanations for each step." }, { "title": "Assist in a task", "prompt": "How do I make a delicious lemon cheesecake?" } ], "parameters": { "temperature": 0.9, "top_p": 0.95, "repetition_penalty": 1.2, "top_k": 50, "truncate": 500, "max_new_tokens": 300 } } ]` ```

If so what models are you using?

Tried both, same result

What DB are you using?

MONGODB_URL=mongodb://localhost:27017

Does chat-ui work without the websearch?

Text generation works always both in search mode and normal mode, no error message in web UI when using normal mode, but an error always pops up in the console.

InferenceOutputError: Invalid inference output: Expected Array<{generated_text: string}>. Use the 'request' method with the same parameters to do a custom call with no type checking. at Proxy.textGeneration (file:///D:/Ablage/Projekte/Experimente/AI/huggingchat/node_modules/@huggingface/inference/dist/index.mjs:460:11) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async Module.generateFromDefaultEndpoint (/src/lib/server/generateFromDefaultEndpoint.ts:22:28) at async POST (/src/routes/conversation/[id]/summarize/+server.ts:30:26) at async Module.render_endpoint (/node_modules/@sveltejs/kit/src/runtime/server/endpoint.js:47:20) at async resolve (/node_modules/@sveltejs/kit/src/runtime/server/respond.js:388:17) at async Object.handle (/src/hooks.server.ts:66:20) at async Module.respond (/node_modules/@sveltejs/kit/src/runtime/server/respond.js:259:20) at async file:///D:/Ablage/Projekte/Experimente/AI/huggingchat/node_modules/@sveltejs/kit/src/exports/vite/dev/index.js:506:22

seongminp commented 1 year ago

Temp solution:

Say you have text-generation-inference running on http://1.1.1.1:8080.

in src/lib/server/generateFromDefaultEndpoint.ts, change

{
    model: endpoint.url,
    inputs: prompt,
    parameters: newParameters,
}

to

{
    model: `http://1.1.1.1:8080`,
    inputs: prompt,
    parameters: newParameters,
}
CookieKlecks commented 1 year ago

It seems to fix the problem, if you omit the /generate_stream in the url in the definition of the model in the .env file. This means that the configuration should e.g. look like this:

MODELS=`[
  {
    "endpoints": [
        {"url": "http://127.0.0.1:8080"}
    ],
    "name": "...",
    ...
  }
  ...
]`
GeorgeStrakhov commented 1 year ago

experiencing the same issue here when trying to connect to a custom inference model running in a separate docker on port 8000

schauppi commented 1 year ago

It seems to fix the problem, if you omit the /generate_stream in the url in the definition of the model in the .env file. This means that the configuration should e.g. look like this:

MODELS=`[
  {
    "endpoints": [
        {"url": "http://127.0.0.1:8080"}
    ],
    "name": "...",
    ...
  }
  ...
]`

This solved the problem for me - it fixed the InferenceOutputError and the web search is working.