guidance-ai / guidance

A guidance language for controlling large language models.
MIT License
18.69k stars 1.03k forks source link

Error with non-English sentence #470

Open QuangBK opened 9 months ago

QuangBK commented 9 months ago

The bug Thank you for the new version! I'm working with Llama-2-13B-chat. While it works fine in English. When I added some non-English sentences to the prompt, it gave this error as below.

To Reproduce

My model and prompt with non-English sentences.

import torch
from transformers import AutoModelForCausalLM, GPTQConfig, AutoTokenizer
from guidance import models, gen, select
import guidance

PATH_MODEL = '/content/model_13b_translation/TheBloke/Llama_2_13B_chat_GPTQ'

gptq_config = GPTQConfig(bits=4, use_exllama=True, exllama_config={"version":2})
model = AutoModelForCausalLM.from_pretrained(PATH_MODEL, device_map="auto", trust_remote_code=True, quantization_config=gptq_config)
tokenizer = AutoTokenizer.from_pretrained(PATH_MODEL)

gmodel = models.Transformers(model=model, tokenizer=tokenizer)

prompt = '''[INST] <<SYS>>
You are a helpful, respectful and honest assistant.
<</SYS>>
Please translate this question from Vietnamese to English: "Có một cửa hàng sushi rất ngon ở bên kia đường." [/INST] Sure, the English version of that sentence is:'''

When I use Guidance

lm = gmodel + prompt + gen(stop='\n', max_tokens=200)

It gives this error.

[INST] <<SYS>>
You are a helpful, respectful and honest assistant.
<</SYS>>
Please translate this question from Vietnamese to English: "Có một cửa hàng sushi rất ngon ở bên kia đường." [/INST] Sure, the English version of that sentence is:
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[<ipython-input-15-a90df77c853d>](https://localhost:8080/#) in <cell line: 1>()
----> 1 lm = gmodel + prompt + gen(stop='\n', max_tokens=200)

2 frames
[/usr/local/lib/python3.10/dist-packages/guidance/models/_local.py](https://localhost:8080/#) in __call__(self, grammar, max_tokens, n, top_p, temperature, ensure_bos_token, log_probs)
    331             # if we cannot consume any more tokens then we are done
    332             if not is_forced and token_pos < len(sampled_token) and trie == self._token_trie:
--> 333                 assert parser.matched(), "We can't consume any more tokens, but we are not yet done! Perhaps your model's token set is incomplete?"
    334 
    335                 # TODO: if we exactly match the end of the pattern then we can commit to this last token

AssertionError: We can't consume any more tokens, but we are not yet done! Perhaps your model's token set is incomplete?

However, without Guidance, it works well as below:

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.cuda()
outputs = model.generate(input_ids, max_length=250)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
[INST] <<SYS>>
You are a helpful, respectful and honest assistant.
<</SYS>>
Please translate this question from Vietnamese to English: "Có một cửa hàng sushi rất ngon ở bên kia đường." [/INST] Sure, the English version of that sentence is:

"There's a very delicious sushi shop over there on the street."

System info (please complete the following information):

adamwithit commented 9 months ago

same problem, but it happens in Chinese

4sunshine commented 9 months ago

Inference in Chinese doesn't work for me, +1

zhangyi999-g commented 9 months ago

Inference in Chinese doesn't work for me, +1

+1. Is there any solution?

anhvth commented 9 months ago

I've been investigating the issue and it seems that the problem might be with the bytes tokenizer using trie. From what I understand, the method _tokenize_prefix is designed to return the longest valid bytes token. However, the assumption underlying the bytes tokenizer, as implemented in this repository, does not seem to hold for all tokenizers.

To address this, my suggestion is to switch to the default tokenizer. The following code snippet outlines the proposed change:

def _tokenize_prefix(self, prompt):
    if isinstance(prompt, bytes):
        prompt = prompt.decode("utf-8")
    return self._orig_tokenizer(prompt).input_ids, []

This modification ensures that if the prompt is in bytes, it gets decoded to a UTF-8 string before tokenization. This approach might be more robust and universally applicable.

MINGYUK commented 8 months ago

Same thing happening to me in Korean.

@anhvth I can't seem to find _orig_tokenizer from the source code. Where Can I find it? Additionally, guaranteeing that prompt is decoded into bytes using utf-8 seems to have already been implemented:

    def __call__(self, grammar, max_tokens=1000000, n=1, top_p=1, temperature=0.0, ensure_bos_token=True):
        assert n == 1, "Still need to add support for n > 1!"

        # get our current context in bytes
        prompt = self._current_prompt()
        prompt = bytes(prompt, encoding="utf-8")
freckletonj commented 8 months ago

Same problem, I've left a comment on a related ticket: https://github.com/guidance-ai/guidance/issues/454#issuecomment-1878149397

MINGYUK commented 8 months ago

update: using llamacpp instead of transformers solved the problem for me.

freckletonj commented 8 months ago

@MINGYUK good to know thanks, unfortunately llama.cpp doesn't work with gptq

daioba commented 5 months ago

I had the same problem when using Japanese in the prompts. However, after reflecting the commit 8f5b3bdfe28455ef267da3e0e590a0d9a4d08104, the error disappeared. I don't yet understand enough to explain the details, but I shared the information for your reference. I hope it helps someone.