hello , we found PinyinTokenizer return wrong offset when input token contains special charset like "-,%,<" ,which would make luence throw "startOffset must be non-negative, and endOffset must be >= start…" exception
see PinyinAnalysisTest.TestPinyinPosition5() test case
hello , we found PinyinTokenizer return wrong offset when input token contains special charset like "-,%,<" ,which would make luence throw "startOffset must be non-negative, and endOffset must be >= start…" exception
see PinyinAnalysisTest.TestPinyinPosition5() test case