Open ageorgou opened 6 years ago
Some notes from looking into how we can do this without coding from scratch:
localeCompare
method of strings is useful, but by default it sorts š before ş, which is not the order we want. Its behaviour may also depend on the user's browser (and location? but I think not).Failing this, we can just build the string comparison from scratch based on our desired ordering.
We need to check which non-ASCII characters we want to support; the Oracc docs include more characters than the one mentioned above.
On the back-end side, the ICU plugin for ElasticSearch may be of interest.
Putting this here because it's surprisingly hard to search for: The Unicode default collation chart (DUCET - Default Unicode Collation Element Table): http://unicode.org/charts/collation/. This is the sorting currently chosen in the backend.
So, what do we actually want now? We had plain ASCII searching, which was bad, now we're back to English collation. Perhaps we want to adjust it somewhat? Or maybe it's fine?
From the meeting on 1 May:
Essentially:
The last point could be clarified a bit. For example, does AB-C come after ADE? In other words, does any "-" make the word appear at the end, or is it just a character sorted after z?