Closed felixbarny closed 8 years ago
Interestingly, this problem only appears when the word has the ung
suffix. When analyzing Wandhalter
everything works as expected.
I kind of solved this by applying a stemmer before and after the decomp
filter:
index :
analysis :
analyzer :
analyzer_decomp :
type : custom
tokenizer : standard
filter : [lowercase, snow_de, decomp, snow_de]
filter :
decomp:
type: decompound
snow_de :
type : snowball
language : German2
tokenizer:
decomp:
type: standard
filter:
- decomp
The term
Wandhalterung
is split to the tokenswand
,alterung
instead ofwand
,halterung
. When setting the threshold to0.63
or higher, the tokens arewandh
andalterung
. What can I do to fix this?These are my settings:
I'm using Elasticsearch 2.1.1 and elasticsearch-analysis-decompound 2.1.1.0