ngxson / wllama

WebAssembly binding for llama.cpp - Enabling in-browser LLM inference
https://huggingface.co/spaces/ngxson/wllama
MIT License
390 stars 19 forks source link

Differences in template application #127

Open flatsiedatsie opened 4 days ago

flatsiedatsie commented 4 days ago

After switching to the Jinja templating engine, I got the feeling that my default model (Danube 3 500m) wasn't giving the same answers.

So I did a test between the old and new version, and to my surprise there is a difference:

Transformers.js:

  <|prompt|>What's the difference between red and green apples?</s><|answer|>

Jinja:

  <|prompt|>What's the difference between red and green apples?<|answer|> 

I then tried to use their latest Q4_K_M .gguf supplied by h2o.ai (for jinja) and also set their repo's latest config files (for Transformers.js). The result was the same.

I then compared the template that's embedded in the .gguf with the one in the config files. They were the same:

{% for message in messages %}{% if message['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% if ((message['role'] == 'user') != (loop.index0 % 2 == 0)) or ((message['role'] == 'assistant') != (loop.index0 % 2 == 1)) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '<|prompt|>' + message['content'].strip() + eos_token }}{% elif message['role'] == 'assistant' %}{{ '<|answer|>' + message['content'].strip() + eos_token }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|answer|>' }}{% endif %}
{% for message in messages %}{% if message['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% if ((message['role'] == 'user') != (loop.index0 % 2 == 0)) or ((message['role'] == 'assistant') != (loop.index0 % 2 == 1)) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '<|prompt|>' + message['content'].strip() + eos_token }}{% elif message['role'] == 'assistant' %}{{ '<|answer|>' + message['content'].strip() + eos_token }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|answer|>' }}{% endif %}

I then dove into the jinja template code, and realised I had noticed something odd earlier - [object Map] in the generated template - and had added some code to filter that out:

<|prompt|>Is the government of China a repressive regime?[object Map]<|answer|>

Is the example code (from utils.js) I was using correct?

const defaultChatTemplate = "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}";

        const template = new Template(
            window.llama_cpp_app.getChatTemplate() ?? defaultChatTemplate,
        );

        let rendered = template.render({
            messages,
            bos_token: await window.llama_cpp_app.detokenize([window.llama_cpp_app.getBOS()]),
            eos_token: await window.llama_cpp_app.detokenize([window.llama_cpp_app.getEOS()]),
            add_generation_prompt: true,
        });

        console.log("jinja: rendered: ", rendered);
flatsiedatsie commented 4 days ago

Here's a console log with the variables split out first:

Screenshot 2024-10-08 at 13 05 08
flatsiedatsie commented 4 days ago

I've solved it by adding new TextDecoder:

        const pre_bos_token = window.llama_cpp_app.getBOS();
        const pre_eos_token = window.llama_cpp_app.getEOS();
        console.log("jinja: pre_bos_token: ", pre_bos_token);
        console.log("jinja: pre_eos_token: ", pre_eos_token);

        let bos_token = await window.llama_cpp_app.detokenize([window.llama_cpp_app.getBOS()])
        let eos_token = await window.llama_cpp_app.detokenize([window.llama_cpp_app.getEOS()])
        bos_token = new TextDecoder().decode(bos_token);
        eos_token = new TextDecoder().decode(eos_token);
        console.log("jinja: bos_token: ", bos_token);
        console.log("jinja: eos_token: ", eos_token);

        let rendered = template.render({
            messages,
            bos_token: bos_token,
            eos_token: eos_token,
            add_generation_prompt: true,
        });
        console.log("jinja: rendered: ", rendered);

(this issue can now be closed)

felladrin commented 4 days ago

That was a great investigation!

ngxson commented 4 days ago

bos_token: await window.llama_cpp_app.detokenize([window.llama_cpp_app.getBOS()]),

This is one of the reason why switching to typescript will save you some headaches.

image

The detokenize function always returns Uint8Array, not a string. If you try to input Uint8Array into a function that needs string, typescript will tell you not to do that.

image
flatsiedatsie commented 4 days ago

You do realize that code is from Wllama right? :-D

https://github.com/ngxson/wllama/blob/ffcd98a8848160cc319896ef8736a605bc831478/examples/main/src/utils/utils.ts#L42

ngxson commented 2 days ago

Hmm, right, the object inside template.render is not typed.

In typescript, we can also force checking the type using satisfies, for example this will cause error I mentioned above:

bos_token: await window.llama_cpp_app.detokenize([window.llama_cpp_app.getBOS()]) satisfies string,