infinilabs / analysis-pinyin

🛵 This Pinyin Analysis plugin is used to do conversion between Chinese characters and Pinyin.
Apache License 2.0
2.96k stars 548 forks source link

elasticsearch-analysis-pinyin 不能很好的对首字母或者全拼进行分词的问题 #135

Open huaqixirenyy opened 7 years ago

huaqixirenyy commented 7 years ago

您好: 我正在学习使用elasticsearch 和分词,版本是5.0.2版本,IK和pinyin 分词分别是从github上下载的已经release 的5.0.2 版本。 创建了mapping 配置: 建立mapping如下所示: { "mappings": { "test20170909": { "_all": { "enabled": false }, "properties": { "objname": { "type": "text", "analyzer": "ik_max_word", "boost": 1, "fields": { "pinyin": { "type": "text", "analyzer": "pinyin_analyzer", "search_analyzer": "pinyin_analyzer", "term_vector": "with_positions_offsets", "boost": 2 }, "raw": { "type": "keyword", "boost": 3 } } } } } }, "settings": { "index.mapping.ignore_malformed": true, "index.mapper.dynamic": false, "index": { "analysis": { "filter": { }, "analyzer": { "pinyin_analyzer": { "tokenizer": "my_pinyin" } }, "tokenizer": { "my_pinyin": { "type": "pinyin", "keep_joined_full_pinyin": "true", "lowercase": "true", "none_chinese_pinyin_tokenize": "true", "keep_original": "false", "keep_none_chinese_together": "true", "keep_none_chinese": "true", "keep_separate_first_letter": "false", "limit_first_letter_length": "16", "keep_full_pinyin": "false", "edgegram_last_joined_full_pinyin": "false", "keep_cacuminal_in_first_letter": "false" } }, "number_of_shards": "1", "number_of_replicas": "2" } } } }

...:/test20170909/_analyze?analyzer=pinyin_analyzer&text=wangqiu ...:/test20170909/_analyze?analyzer=pinyin_analyzer&text=wq 获取的tokens为空。 已经设置了none_chinese_pinyin_tokenize为true,是由于版本号太低引起的吗?

medcl commented 7 years ago

有什么错误提示么?