huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.84k stars 1.04k forks source link

Tokenizer's unset `eos_token_id` causes Galactica model to fail when using grammar #2340

Open sadra-barikbin opened 2 months ago

sadra-barikbin commented 2 months ago

Hi there!

Galactica tokenizer's eos_token_id is not set but it's set in its model config. We account for tokenizer's pad_token_id being none in CausalLM but not for eos_token_id.

https://github.com/huggingface/text-generation-inference/blob/2b19d671b4d1020e31276477f278ca87cfa37a3c/server/text_generation_server/models/causal_lm.py#L547-L552

On the other hand, Outline's RegexFSM gives EOS as the final instruction which is None in our case.

        next_tokens_to_end_states = self.states_to_token_maps.get(state)
        if next_tokens_to_end_states is None:
            return Write([self.eos_token_id])

This then causes GrammarLogitProcessor.__call__ to fail upon biasing the logits.

https://github.com/huggingface/text-generation-inference/blob/2b19d671b4d1020e31276477f278ca87cfa37a3c/server/text_generation_server/utils/logits_process.py#L501-L503

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/logits_process.py", line 506, in __call__
    mask[:, allowed_tokens] = 0
RuntimeError: Could not infer dtype of NoneType
ErikKaum commented 2 months ago

Thanks for reporting the bug @sadra-barikbin 👍

I'll ping @drbh as well here (if you have bandwidth)