This PR sets the default stop_token_ids of TokenizerInfo be the eos_token_id of the huggingface tokenizer.
Previously we auto-detect the stop token ids based on a set of builtin stop token strings. However, some downstream frameworks may not recognize the stop token ids detected by us.
After this PR, the auto-detection only happens when the tokenizer does not have a eos_token_id defined.
This PR sets the default
stop_token_ids
ofTokenizerInfo
be theeos_token_id
of the huggingface tokenizer.Previously we auto-detect the stop token ids based on a set of builtin stop token strings. However, some downstream frameworks may not recognize the stop token ids detected by us.
After this PR, the auto-detection only happens when the tokenizer does not have a eos_token_id defined.