karlb / wikdict-web

Web front end for WikDict dictionaries
https://www.wikdict.com
MIT License
16 stars 2 forks source link

Check ICU handling in sqlite #14

Closed karlb closed 2 years ago

karlb commented 3 years ago

See discussion in #10.

karlb commented 3 years ago

This should help https://github.com/karlb/sqlite-icu.

karlb commented 2 years ago

This should help https://github.com/karlb/sqlite-icu.

Unfortunately, FTS3 also has to be compiled with ICU support to get access to the ICU tokenizer. Compiling FTS3 as a loadable extension is theoretically possible, but not maintained anymore. See https://sqlite.org/forum/forumpost/eef52c5f33

Using the generic unicode tokenzier seems to work well enough for most cases, so I will just keep it simple and ignore ICU support for now. This might have downsides for languages which don't use spaces to separate words (e.g. Japanese), but I am not proficient enough in any of these to actually judge the impact.