Search: Normalization for transcribed text

See #5 for context. For the MVP phase no normalization is going to be implemented, but this would be a nice-to-have.

Both Solr and Postgres appear to have acceptable support for normalizing English texts. For Latvian, Solr provides a rudimentary normalization algorithm in the form of a stemmer and a stopword list, both sourced from Kārlis Krēsliņš' '96 thesis. Presumably these should be implemented somehow—preliminary research suggests Postgres has support for something called dictionaries which is intended for this purpose.

Alternatively, perhaps algorithms from AILab's projects (cf also) could be used instead, given that they've had a couple more years of effort put into them, but I'm not sure where to begin with those.

untitled-pit-group / foxhound

Search: Normalization for transcribed text #17