infinilabs / analysis-pinyin

🛵 This Pinyin Analysis plugin is used to do conversion between Chinese characters and Pinyin.
Apache License 2.0
2.94k stars 547 forks source link

fix "startOffset must be non-negative, and endOffset must be >= start… #283

Open jiangyunpeng opened 1 year ago

jiangyunpeng commented 1 year ago

hello , we found PinyinTokenizer return wrong offset when input token contains special charset like "-,%,<" ,which would make luence throw "startOffset must be non-negative, and endOffset must be >= start…" exception

see PinyinAnalysisTest.TestPinyinPosition5() test case