Closed sandertan closed 3 years ago
Interesting, because when I set add_prefix_space=True I get the following:
Although the output is incorrect, it seems to handle spaces at the beginning of the sentence correctly.
You're right, the add_prefix_space=True
was not set when loading the tokenizer from file. Added functionality to MetaCAT for that https://github.com/CogStack/MedCAT/pull/75 . I get the same results now.
Change was merged into MedCAT master. If you pull that, you will be able to run this code.
Weird behavior with/without a space at start of sentence and
ByteLevelBPETokenizer(add_prefix_space=True)