Closed vicchi closed 4 years ago
Hi!
It seems there is an issue with the synonym files and ICU. This error happens when the tokenizer completely remove a string, and *
and ✓
are neither emoji nor "text".
Thanks for letting us know, I will work on a patch and I also wish to add tests (see #12) to avoid any issues like this upon futur Elasticsearch releases.
@damienalexandre Thank you!
This is resolved in https://github.com/jolicode/emoji-search/commit/e5309a88cf25d7a6e3c81568af4c7509b6012442 ; I fixed all the files in all languages and added automated tests to check them on each changes.
Thanks for reporting the issue!
Cheers
@damienalexandre Only just got around to testing this today and so far, the signs are good. Thanks once again
Hi @damienalexandre ... Thank you for continuing to collate and update the synonyms files ...
Related to #26, I've used your example mappings with one slight change (placing the synonyms file in
/etc/elasticsearch/synonyms
instead) as follows:analysis-icu
plugin and restarted Elasticsearchsudo cp synonyms/cldr-emoji-annotation-synonyms-en.txt /etc/elasticsearch/synonyms/
Then ...
which gives me ...
Line
859
is the first instance of a synonym which has a non-alpha synonym, in this case:Removing the
*
from the definition works but then the same issue recurs from line1257
(✅ => ✅, ✓, button, check, mark
) onwards.This is on Elasticsearch
7.8.0
on Ubuntu20.04
.Is this a problem with the synonyms file or am I missing something very obvious?