kemdict / kemdict

Kemdict 整合了多部辭典,提供一次搜尋的功能。同時包含另一個我拿來記錄教育部沒有收錄的詞的字典。
https://kemdict.com
Other
2 stars 0 forks source link

Mixed Unicode normalization causing seemingly equal searches to not return results #11

Closed kisaragi-hiu closed 1 year ago

kisaragi-hiu commented 1 year ago

The problem is SQLite COLLATE NOCASE only knows about ASCII characters. (We can't just downcase everything because the case matters for some words (like "AI" for instance).) We need to do an NFD normalization before saving into database.

kisaragi-hiu commented 1 year ago

Actually, LIKE is case-insensitive by default.

The problem I faced was that when I was searching Ǹg-bāng, it didn't return any result even though it should. I thought this is an issue with casing, but after ensuring Unicode normalization it started working.

With the issue resolved, smart case is probably actually not necessary. AI and other loan words from English acronyms are probably the only cases where the word has just one correct casing, and they're unique enough to not be buried in entries with the wrong casing.

kisaragi-hiu commented 1 year ago

Updated the title because I keep getting confused by what has been done and what not.