Closed scissorstail closed 7 months ago
Hey! This might be related to the cleanup_tokenization_spaces
argument available in transformers. Would you mind trying with tokenizer.decode(encoding, clean_up_tokenization_spaces = False)
. If that does not work could you push the tokenizer to the hub?
It seems to be working. However, I'm not quite sure what this option does. Thanks anyway.
It seems that whitespace characters at a specific position are not being displayed.
To Reproduce
Please run sample below.
sample.txt
sample.py
Expected behavior
Actual behavior
I might have configured something incorrectly. This behavior is different from what I was expecting.