jolicode / emoji-search

:smile: Emoji synonyms to build your own emoji-capable search engine (elasticsearch, solr, OpenSearch)
https://jolicode.com/blog/elasticsearch-icu-now-understands-emoji
MIT License
217 stars 65 forks source link

Watch for Lucene 6.7 as it will ship with up to date ICU ! <3 #17

Closed damienalexandre closed 5 years ago

damienalexandre commented 7 years ago

https://issues.apache.org/jira/projects/LUCENE/versions/12340572

We may be able to simplify the deps as https://issues.apache.org/jira/browse/LUCENE-7540 is fixed.

damienalexandre commented 6 years ago

We are compatible with ICU 60.2, our hack is still needed.

damienalexandre commented 6 years ago

Lucene 7.0 = ICU 59.1, version utilisée dans Elasticsearch aussi. Lucene 7.3 = ICU 60.2

damienalexandre commented 6 years ago

I tried to update ICU to 60.2, I got some issues with DefaultICUTokenizerConfig class:

https://github.com/apache/lucene-solr/blob/df0f141907b0701d7b1f1fc297ae33ef901844a0/lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/segmentation/DefaultICUTokenizerConfig.java#L67-L71

There is a attempt at reading default rule files even when doing nothing with them, but there is a signature authentification and I think Lucene/ES/ICU are not reading the proper ones (Lucene expect a specific ICU, and I have another one). See http://grepcode.com/file/repo1.maven.org/maven2/com.ibm.icu/icu4j/55.1/com/ibm/icu/impl/ICUBinary.java#37 for example.

Maybe we could try providing our own rbbi rules via https://github.com/elastic/elasticsearch/pull/13651.