Open guoxijun opened 7 years ago
什么版本呢?
最新的5.4.0
同样我定义了另外一个分析器: "mobile_tokenizer" : { "type" : "nGram", "min_gram" : 3, "max_gram" : 20, "token_chars" : ["letter","digit"] }
mapping如下: "phones": { "type": "string", "analyzer": "mobile_analyzer" }
测试分析里面是有2219的:
25
token "2219"
start_offset 3
end_offset 7
type "word"
position 25
但是索引的是时候: curl -XGET 'http://192.168.36.140:9200/1/users/_search?pretty' -d '{
"query": { "bool":{ "must":{ "multi_match":{ "query": "2219", "fields":["phones"], "type": "phrase" } } } }
}' { "took" : 0, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : [ ] } } 也是没有,我用的都是5.4.0
我贴一下我整个设置: curl -XPUT http://192.168.36.140:9200/1/ -d' { "settings" : { "analysis" : { "analyzer" : { "pinyin_analyzer" : { "tokenizer" : "pinyin_tokenizer" }, "email_analyzer" : { "tokenizer" : "email_tokenizer", "char_filter": ["email_char_filter"] }, "mobile_analyzer" : { "tokenizer" : "mobile_tokenizer" } }, "tokenizer" : { "pinyin_tokenizer" : { "type" : "pinyin", "keep_first_letter":true, "keep_separate_first_letter" : true, "keep_full_pinyin" : true, "keep_original" : false, "limit_first_letter_length" : 16, "lowercase" : true, "keep_joined_full_pinyin":true }, "email_tokenizer" : { "type" : "nGram", "min_gram" : 1, "max_gram" : 20, "token_chars" : ["letter"] }, "mobile_tokenizer" : { "type" : "nGram", "min_gram" : 3, "max_gram" : 20, "token_chars" : ["digit"] } }, "char_filter" : { "email_char_filter" : { "type" : "pattern_replace", "pattern" : "(@.*)", "replacement" : "" } } } }, "mappings" : { "users" : { "properties" : { "realName": { "type": "keyword", "fields": { "pinyin": { "type": "text", "store": "no", "term_vector": "with_positions_offsets", "analyzer": "pinyin_analyzer", "boost":10 } } }, "emails": { "type": "string", "analyzer": "email_analyzer" }, "phones": { "type": "string", "analyzer": "mobile_analyzer" } } }, "depts" : { "properties" : { "name": { "type": "keyword", "fields": { "pinyin": { "type": "text", "store": "no", "term_vector": "with_positions_offsets", "analyzer": "pinyin_analyzer", "boost":10 } } } } }, "groups" : { "properties" : { "name": { "type": "keyword", "fields": { "pinyin": { "type": "text", "store": "no", "term_vector": "with_positions_offsets", "analyzer": "pinyin_analyzer", "boost":10 } } } } } } }'
如果不用phrase索引,结果没问题
curl -XGET 'http://192.168.36.140:9200/1/users/_search?pretty' -d '{
"query": {
"bool":{
"must":{
"multi_match":{
"query": "2219",
"fields":["realName","realName.pinyin","phones","emails"]
}
}
}
}
}'
但是索引zdf的时候,结果就不准了,如果加上phrase类型,索引zdf的时候就准确,但是索引不到2219,
不知道你明白明白我的意思
测试分词结果 http://192.168.36.140:9200/1/_analyze?text=周大福&analyzer=pinyin_analyzer
tokens
0
token "zhou" start_offset 0 end_offset 1 type "word" position 0 1
token "zdf" start_offset 0 end_offset 3 type "word" position 0 2
token "da" start_offset 1 end_offset 2 type "word" position 1 3
token "fu" start_offset 2 end_offset 3 type "word" position 2
查询方式: curl -XGET 'http://192.168.36.140:9200/1/users/_search?pretty' -d '{ "query": { "bool":{
"must":{ "multi_match":{ "query": "zdf", "fields":["realName","realName.pinyin"], "type": "phrase" } } } } }'
发现并没有结果,查zhoudafu,zhou,dafu等就有,为啥会这样子的?