Open sissilab opened 1 year ago
使用 ik 分词会直接忽略掉非ASCII字符,如下例子:açaí à,请问这种情况如何处理?
açaí à
GET _analyze { "analyzer": "ik_max_word", "text": "açaí à la carte" } { "tokens": [ { "token": "la", "start_offset": 7, "end_offset": 9, "type": "ENGLISH", "position": 0 }, { "token": "carte", "start_offset": 10, "end_offset": 15, "type": "ENGLISH", "position": 1 } ] }
此为 asciifolding filter过滤情况,能转换为ASCII字符:
asciifolding
GET /_analyze { "tokenizer" : "standard", "filter" : ["asciifolding"], "text" : "açaí à la carte" } { "tokens": [ { "token": "acai", "start_offset": 0, "end_offset": 4, "type": "<ALPHANUM>", "position": 0 }, { "token": "a", "start_offset": 5, "end_offset": 6, "type": "<ALPHANUM>", "position": 1 }, { "token": "la", "start_offset": 7, "end_offset": 9, "type": "<ALPHANUM>", "position": 2 }, { "token": "carte", "start_offset": 10, "end_offset": 15, "type": "<ALPHANUM>", "position": 3 } ] }
使用 ik 分词会直接忽略掉非ASCII字符,如下例子:
açaí à
,请问这种情况如何处理?此为
asciifolding
filter过滤情况,能转换为ASCII字符: