Open horsten opened 2 months ago
Hi! You can now use the tokenizer
to format your chat template. This is what we do in production (see here) and is the reason why we haven't updated the prompts recently. Will update the docs to better mention this.
Thanks, I suspected you were doing something like that. My problem is that I'm using a kind of hackish and hybrid solution to get the tools support working. I'm running inference on fireworks.ai, where there's only their version of the OpenAI API endpoint and their own custom one. Neither provides me with sufficient control with the template, so I had to use the completions API instead of chat_completions and format the template in chat-ui using JavaScript (it's trivial in Python since the standard template is easy to apply there with Jinja2, but less so in JS). But I implemented a hardcoded template-generator that will do for now.
I didn't think I could just load the tokenizer unless I was using a TGI endpoint but maybe I'm wrong? A quick attempt doesn't look like it.
err: {
"type": "TypeError",
"message": "this.added_tokens.toSorted is not a function",
"stack":
TypeError: this.added_tokens.toSorted is not a function
at new PreTrainedTokenizer (file:///home/th/sec/src/llmweb/chat-ui/node_modules/@huggingface/transformers/dist/transformers.mjs:22814:18)
at Module.getTokenizer (/home/th/sec/src/llmweb/chat-ui/src/lib/utils/getTokenizer.ts:12:12)
[12:01:32.743] ERROR (2642393): Failed to load tokenizer for model accounts/fireworks/models/llama-v3p1-8b-instruct consider setting chatPromptTemplate manually or making sure the model is available on the hub.
at Module.getTokenizer (/home/th/sec/src/llmweb/chat-ui/src/lib/utils/getTokenizer.ts:12:12)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async Object.endpointOai [as openai] (/home/th/sec/src/llmweb/chat-ui/src/lib/server/endpoints/openai/endpointOai.ts:76:19)
at async Object.getEndpoint (/home/th/sec/src/llmweb/chat-ui/src/lib/server/models.ts:270:20)
at async Object.start (/home/th/sec/src/llmweb/chat-ui/src/routes/conversation/[id]/+server.ts:330:21)
}
(I put "tokenizer": {"tokenizerUrl": "https://huggingface.co/nsarrazin/llama3.1-tokenizer/resolve/main/tokenizer.json", "tokenizerConfigUrl": "https://huggingface.co/nsarrazin/llama3.1-tokenizer/raw/main/tokenizer_config.json"} in my model definiton, and I checked that I can get the files with curl from the server, and tried to get the tokenizer with tokenizer = await getTokenizer(m.tokenizer)
in the same way it's done in models.ts).
EDIT: I now see it's not that and it should work, but it's some kind of dependency issue which could hint at the dependencies in package.json needing an update? EDIT2: I found that "npm upgrade @hugginface/transfomers" was enough, and I now have the tokenizer working, so I can scrap the ugly hack I'd made for template generation.
Can you provide any insights on how I should get tools support integrated "cleanly" in my scenario? I'm currently using a bunch of hacks based on (outdated) documentation guessing, and experimentation and it could work better...
Bug description
In README.md, it's stated that the prompts used in production for HuggingChat can be found in PROMPTS.md.
However, PROMPTS.md has not been updated for 7 months and there are several prompts missing for newer models.