Closed damienalexandre closed 2 years ago
Tried to import https://github.com/apache/lucene-solr/blob/4522e45bdadd4268a9270135130fc28a7f46c627/lucene/analysis/icu/src/data/uax29/Default.rbbi as custom rbbi config, looks like it's ok, but the error following show that there may be some bad word breaking.
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "failed to build synonyms"
}
],
"type": "illegal_argument_exception",
"reason": "failed to build synonyms",
"caused_by": {
"type": "parse_exception",
"reason": "Invalid synonym rule at line 1",
"caused_by": {
"type": "illegal_argument_exception",
"reason": "term: 😀 was completely eliminated by analyzer"
}
}
},
"status": 400
}
The plugin is not needed anymore.
Looks like we could just provide a new Rule File instead of tricking the ICU Tokenizer.
As this code show:
https://github.com/apache/lucene-solr/blob/23bff7dbc207083af2ccb1b308c121ac18c36508/lucene/analysis/icu/src/java/org/apache/lucene/analysis/icu/segmentation/ICUTokenizerFactory.java#L116-L125
The default config is used when there is no file for the current "script" (which was a fear I had about this possibility to change de Rbbi).
What the plugin could do then: