Closed inostia closed 9 years ago
+100. This is a really good idea.
On a quick glance around Google, I'm not sure if this is really easy or really hard. Feel free to look around for solutions, or contribute one of your own.
FYI: I'm currently using Sqlite 3.8.11 as my storage engine in the deployment, with FT3 & FT4 enabled. My ORM is Peewee, and I'm using the "porter" tokenizer in Sqlite.
This article sounds helpful: http://www.swwritings.com/post/2013-05-04-diacritics-and-fts/
I've done some experiments. Using the "unicode61" FTS tokenizer in Sqlite solves the diacritics problem.
However, the version of Sqlite compiled into out-of-the-box Python wasn't compiled with the appropriate flags to make this tokenizer work.
The solution? Don't use Python's built-in sqlite3 module, but APSW instead. APSW downloads and compiles in its own copy of sqlite3, independent of any system installation. It also lets you set all of the flags for fancy features, including the "unicode61" tokenizer.
Great! The blog post you pointed to seems to be pretty straightforward. Thanks for this the project's awesome.
Fixed. Updated DB will go live later today.
Eg. Thomas Koner should return results for Thomas Köner
I've never contributed to an open source project but I could try to implement this.