Closed kaykay-dv closed 1 year ago
With the latest introduction of the Tokenizer classes, it seems that the default tokenizer does not remove diacritics although explicitly told to do so. For now this behavior is only observed on Linux/Ubuntu machines, it works fine on a mac.
This seems more related to the unit tests itself and the order in which they are executed.
Fixed by re-structuring unit tests
With the latest introduction of the Tokenizer classes, it seems that the default tokenizer does not remove diacritics although explicitly told to do so. For now this behavior is only observed on Linux/Ubuntu machines, it works fine on a mac.