manastech / middleman-search

LunrJS-based search for Middleman
MIT License
58 stars 31 forks source link

Search should be accents-insensitive #1

Closed matiasgarciaisaia closed 8 years ago

matiasgarciaisaia commented 8 years ago

If you index some page with accents like computación and search for it without accents (ie, computacion), the page doesn't get listed as a result.

We should make LunrJS insensitive to that.

See olivernn/lunr.js#16

spalladino commented 8 years ago

After some experiments to include stemmers for different languages from https://github.com/MihaiValentin/lunr-languages, which are now in branch i18n, I opted for allowing the user to customise the pipeline, since lunr-languages was designed to handle a single language (as far as I understood).

Commit a58f1dbe623aeb7fa6d174c5b20766e4dc78bf09 on version 0.3.0 allows the user to inject a custom function like the following, that will keep all English based stemmers, but ignore accents when indexing and performing searches:

search.pipeline = {
    tildes: <<-JS
      function(token, tokenIndex, tokens) {
        return token
          .replace('á', 'a')
          .replace('é', 'e')
          .replace('í', 'i')
          .replace('ó', 'o')
          .replace('ú', 'u')
          .replace('Á', 'A')
          .replace('É', 'E')
          .replace('Í', 'I')
          .replace('Ó', 'O')
          .replace('Ú', 'U');
      }
    JS
  }