Open Ki-Seki opened 4 months ago
Hey! Thanks for opening an issue. Few things first. You are using a custom / local checkpoint with trust remote code.
Fast is not erroring out when you feed OOV, while slow is and it is indeed inconsistent. Would you like to open a PR for a fix? 🤗
Yes, I'll try that. Thank you for your reply!
@ArthurZucker @Ki-Seki can I work on it if it's not fixed yet?
@ArthurZucker @Ki-Seki can I work on it if it's not fixed yet?
I'm OK with that. I have other things to do recently.😭
Sure 🤗
System Info
transformers
version: 4.35.2Who can help?
@ArthurZucker and @younesbelkada
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Output:
Expected behavior
Consistent
decode
behavior in slow tokenizer and fast tokenizer when id exceeds vocab size. For example, instead of raise exceptions, the slow tokenizer output empty strings like the fast tokenizer does.