Problem with default settings of tokenizer

kaykay-dv / pocketsearch

A simple full-text search library for Python using SQLite and its FTS5 extension

https://pocketsearch.readthedocs.io/en/latest/

MIT License

1 stars 0 forks source link

Problem with default settings of tokenizer #37

Closed kaykay-dv closed 1 year ago

kaykay-dv commented 1 year ago

With the latest introduction of the Tokenizer classes, it seems that the default tokenizer does not remove diacritics although explicitly told to do so. For now this behavior is only observed on Linux/Ubuntu machines, it works fine on a mac.

kaykay-dv commented 1 year ago

This seems more related to the unit tests itself and the order in which they are executed.

kaykay-dv commented 1 year ago

Fixed by re-structuring unit tests