Closed pcuenca closed 10 months ago
@ZachNagengast I further checked the Python code and brought over the relevant changes:
nil
. This happens, for example, in the Falcon tokenizer.Tests now pass for the various tokenizers we are considering, and comparing against the well-known Python unknown token ids.
I used an API similar to transformers. To do (in new PRs, potentially):
cc @ZachNagengast, feel free to review!