qwen returns empty string

ngxson / wllama

WebAssembly binding for llama.cpp - Enabling on-browser LLM inference

https://huggingface.co/spaces/ngxson/wllama

MIT License

448 stars 23 forks source link

qwen returns empty string #11

Closed flatsiedatsie closed 6 months ago

flatsiedatsie commented 6 months ago

I noticed something interesting where this tiny model returns an empty string whenever I query it:

https://huggingface.co/Qwen/Qwen1.5-0.5B-Chat-GGUF/resolve/main/qwen1_5-0_5b-chat-q4_0.gguf

felladrin commented 6 months ago

Hey, @flatsiedatsie, could it be related to the prompt you're using?

I tried it and got a good response.

Here's the prompt I used:

<|im_start|>user
Explain quantum computing like I'm five<|im_end|>
<|im_start|>assistant

Could you try again with this one?

Console info, for reference:

ngxson commented 6 months ago

@felladrin Thank you for having a look. I didn't have time to look into details, but seems like Qwen models are quite sensitive to chat templates (due to their small size - there is no room for errors)

Please let me know if that works for you @flatsiedatsie

flatsiedatsie commented 6 months ago

Thanks for testing on your end.

I managed to get output once, but only once.

I'm using the tokenizer from Transformers.js to generate the prompts. There was an issue with that, but it was fixed a while ago as far as I can tell. This process uses Jinja2 templates which are stored on HuggingFace.

tokenizer = await AutoTokenizer.from_pretrained(config_url);
return tokenizer.apply_chat_template(messages, {tokenize:false, return_tensor:false, add_generation_prompt:true});

You hint about the sensitivity to the prompt is very useful though. I'm doing some tests now.

flatsiedatsie commented 6 months ago

It's working now. I'm not even sure why :-D