jolicode / emoji-search

:smile: Emoji synonyms to build your own emoji-capable search engine (elasticsearch, solr, OpenSearch)
https://jolicode.com/blog/elasticsearch-icu-now-understands-emoji
MIT License
218 stars 64 forks source link

Mark the synonym token filter as updateable and provide a better example #25

Open damienalexandre opened 4 years ago

damienalexandre commented 4 years ago

Reading https://www.elastic.co/blog/boosting-the-power-of-elasticsearch-with-synonyms - we quickly see Emoji Search can benefit from the new POST /synonym_test/_reload_search_analyzers API.

Index-time synonyms have several disadvantages:

  • The index might get bigger, because all synonyms must be indexed.
  • Search scoring, which relies on term statistics, might suffer because synonyms are also counted, and the statistics for less common words become skewed.
  • Synonym rules can’t be changed for existing documents without reindexing.

...

Using synonyms in search-time analyzers on the other hand doesn’t have many of the above mentioned problems:

  • The index size is unaffected.
  • The term statistics in the corpus stay the same.
  • Changes in the synonym rules don’t require reindexing of documents.

And:

Starting with Elasticsearch 7.3, this reopening of indices in order to see changes in synonym files is no longer needed.

We must:

damienalexandre commented 3 years ago

Using the synonym as a graph filter and at search time could also be better.

https://www.adelean.com/blog/20210421_synonym_graph_in_elasticsearch/