infinilabs / analysis-pinyin

🛵 This Pinyin Analysis plugin is used to do conversion between Chinese characters and Pinyin.
Apache License 2.0
2.94k stars 547 forks source link

es7.17.0 使用7.17.0版本依然报错startOffset #284

Open zt5062 opened 1 year ago

zt5062 commented 1 year ago

es7.17.0 使用7.17.0版本依然报错startOffset must be non-negative, and endOffset must be >= startOffset, and offsets must not go backwards

zt5062 commented 1 year ago

在插入的数据出现英文字母加符号的时候必然出现,比如CCTV-高清,而CCTV高清-可以正常插入,再就是文本中有中文符号的时候也会报同样的错,比如中文的书名号<>

zt5062 commented 1 year ago

经过整理,字符分为3类,汉字,英文,符号(包括中英文符号)。 汉字开头,没有问题。 符号开头,必然报错。 英文开头,英文字符串后面接汉字没有问题,接符号,报错。

xiaoshi2013 commented 6 months ago

我试了下elasticsearch 8.4.1版本可以 POST /medcl/_doc { "name": "CCTV-高清" }

POST medcl/_search { "query": { "match": { "name.pinyin": { "query": "cctvgq" } } } }

输出 { "took": 3, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 1, "relation": "eq" }, "max_score": 2.4876804, "hits": [ { "_index": "medcl", "_id": "zl8E_Y0BKB1kh6YAO_6p", "_score": 2.4876804, "_source": { "name": "CCTV-高清" } } ] } }