Open yonigottesman opened 1 month ago
I think this is related to huggingface/transformers#25082 and is more related to tokenizers
than transformers
I don't have a fix, but it's a but indeed
So should I open the issue in that repo? this is really needed for huggingface/transformers#30650
Yeah, it's basically the same as https://github.com/huggingface/tokenizers/issues/1553, since the offsets are wrong, the char to token that just uses them is also outputing wrong outputs. Let me transfer the issue!
Any progress on this one?
System Info
transformers
version: 4.44.0Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
This returns None for any char index not just 0
Also,
token_to_char
doesnt return expected results:out.token_to_chars(4)
returnsCharSpan(start=15, end=15)
instead ofCharSpan(start=15, end=19)
Expected behavior
should return
1
out.token_to_chars(4)
should returnCharSpan(start=15, end=19)