Closed josharian closed 2 months ago
This does not reproduce using the upstream llama3 tokenizer.model and tiktoken.
I think the same issue was mentioned, that is because of the transformers
layer's clean_up_tokenization_spaces
. See this: https://github.com/huggingface/transformers/issues/31187
We are gonna deprecate and remove this flag 😉
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Observe that the input has a space before the
!
and the output does not.