Open zpp13 opened 4 weeks ago
Hey! The trie is mostly for internal usage, I think this is expected ! (we don't take spaces into account AFAIK)
This error occurs not only with spaces,for example:
@ArthurZucker and @itazap
Hey, sorry but the tre is not for general purpose usage, if it is breaking a tokenizer than sure let's fix it, but it does not seem to be the case!
System Info
transformers
version: 4.41.0Who can help?
@ArthurZucker and @itazap
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
from transformers.tokenization_utils import Trie trie = Trie() trie.add("abc") trie.add("b") trie.split("ab cd")
['ab c', 'd']
Expected behavior
in my opinion, this should get ['a', 'b', 'cd'],but get ['ab c', 'd'] First submission, sorry if I understand something wrong.