Abnormal output appears when decoding with guidance

LuoKaiGSW commented 1 month ago

I obtained a model for tool invocation through SFT. When I use guidance for decoding, the output result is garbled. However, if I output directly, the result is correct.

One reason I suspect is: My current model is using BloomTokenizer. Since only use_fast = True can be set when loading this tokenizer (only tokenization_bloom_fast.py is available), the obtained tokenizer does not have byte_decoder and sp_model attributes. According to the code, the byte_decoder from gpt2 is read by default. May I ask if this is the reason? What are the specific functions of the byte_decoder and sp_model attributes? The guidance version I am using is 0.1.15. Thank you!

Harsha-Nori commented 1 month ago

Hey Luo, thanks for reporting this! You're right that it might be related to the tokenizer we load (ultimately we need a byte -> token level mapping, as our grammar/constraints operates on byte strings and the LM operates in tokens :|).

Do you know if there's an open source SFT model that's similar to yours that we can test against? I'm not entirely sure what the cause is, and it'd be extremely helpful if there was a debuggable option on our side. Of course, if you're comfortable sharing your model + test case, that would work even better (we can do this privately), but I'm assuming that'll be tough so I'm hoping there's something roughly equivalent on huggingface that we can start debugging against

LuoKaiGSW commented 1 month ago

byte_decoder

Hi Nori, thank you for your reply! I'm sorry, but I may not be able to provide the model to you at the moment. However, I can cooperate with you for some debugging, although there may be some inconveniences. So I want to ask, for BloomTokenizer, is it possible to assign it the byte_decoder attribute in other ways? Similar to what is mentioned here: issue782. Also, I can show you screenshots of the responses with and without using guidance. with guidance （Some prompts have been omitted.）

without guidance

LuoKaiGSW commented 1 month ago

Hey Luo, thanks for reporting this! You're right that it might be related to the tokenizer we load (ultimately we need a byte -> token level mapping, as our grammar/constraints operates on byte strings and the LM operates in tokens :|).

Do you know if there's an open source SFT model that's similar to yours that we can test against? I'm not entirely sure what the cause is, and it'd be extremely helpful if there was a debuggable option on our side. Of course, if you're comfortable sharing your model + test case, that would work even better (we can do this privately), but I'm assuming that'll be tough so I'm hoping there's something roughly equivalent on huggingface that we can start debugging against

In addition, I found that when using the open-source llama2-7b-hf, the following problem will also occur. Could you take a look at this together?

from guidance import gen
llama2 = models.Transformers(path)
lm = llama2 + 'Question: Luke has ten balls. He gives three to his brother.\n'
lm += 'How many balls does he have left?\n'
lm += 'Answer: ' + gen(regex='\d+')

error info

in Engine._cleanup_tokens(self, token_ids, token_byte_positions)
    839     # another ugly hack for tokenizers that are not stable on encode/decode cycles
    840     # currently only Phi-3, should generalize this method if we see more of these
    841     if not hasattr(self, "_disable_retokenize_check"):
--> 842         assert token_byte_positions[-1] == last_pos, "Cross check last_pos"
    844 return token_ids, token_byte_positions

AssertionError: Cross check last_pos

Harsha-Nori commented 1 month ago

Thanks @LuoKaiGSW -- this is really helpful! For the llama2-7b-hf model, are you loading it via Transformers or is this a GGUF loaded via LlamaCPP?

LuoKaiGSW commented 1 month ago

Thanks @LuoKaiGSW -- this is really helpful! For the llama2-7b-hf model, are you loading it via Transformers or is this a GGUF loaded via LlamaCPP?

hi, I load llama2-7b-hf through Transformers.

LuoKaiGSW commented 1 month ago

Thanks @LuoKaiGSW -- this is really helpful! For the llama2-7b-hf model, are you loading it via Transformers or is this a GGUF loaded via LlamaCPP?

hey, @Harsha-Nori, I'm very sorry to bother you. How should I implement a byte_decoder in the case of a bloom_tokenizer? Many thanks！

riedgar-ms commented 1 month ago

@LuoKaiGSW FYI, I have changed the last_pos assertion into a warning, since I did notice it start to cause trouble with Llama models in the latest release. We've been talking about it a lot, since it is something which should work (it's basically checking a 'roundtrip' through the tokeniser); it indicates that something is probably going wrong, but isn't quite definitive.

LuoKaiGSW commented 1 month ago

@LuoKaiGSW FYI, I have changed the last_pos assertion into a warning, since I did notice it start to cause trouble with Llama models in the latest release. We've been talking about it a lot, since it is something which should work (it's basically checking a 'roundtrip' through the tokeniser); it indicates that something is probably going wrong, but isn't quite definitive.

Thank you for your reply, @riedgar-ms. I will retest the performance of llama2-7b-hf. In addition, regarding another issue I mentioned above, for BloomTokenizerFast, since it does not have byte_decoder and sp_model properties, I guess there is a mapping relationship between utf-8 bytes and unicode strings implemented in the code. However, I did not find the relevant code in BloomTokenizerFast. So, I would like to ask if you know where this part of the code is? I want to confirm whether it is the same mapping relationship as gpt2, that is, whether it is correct to use gpt2's byte_decoder as a substitute in the guidance code.

riedgar-ms commented 1 month ago

@LuoKaiGSW I'm afraid I can't help much with that; I'm just learning about the requirements placed on the Tokenizer myself. I can say that there are places that we 'give up' and default to GPT2 tokenisation, but ideally the tokeniser should match the model.

LuoKaiGSW commented 1 month ago

@LuoKaiGSW I'm afraid I can't help much with that; I'm just learning about the requirements placed on the Tokenizer myself. I can say that there are places that we 'give up' and default to GPT2 tokenisation, but ideally the tokeniser should match the model.

Okay, thank you very much. I will continue to explore the related implementations.

guidance-ai / guidance

Abnormal output appears when decoding with guidance #869