Closed Kakueeen closed 3 years ago
@alanw this is breaking testsuite...
[----------] 5 tests from ChineseTokenizerTest
[ RUN ] ChineseTokenizerTest.testOtherLetterOffset
[ OK ] ChineseTokenizerTest.testOtherLetterOffset (0 ms)
[ RUN ] ChineseTokenizerTest.testReusableTokenStream1
[ OK ] ChineseTokenizerTest.testReusableTokenStream1 (0 ms)
[ RUN ] ChineseTokenizerTest.testReusableTokenStream2
[ OK ] ChineseTokenizerTest.testReusableTokenStream2 (1 ms)
[ RUN ] ChineseTokenizerTest.testNumerics
/<<PKGBUILDDIR>>/src/test/analysis/BaseTokenStreamFixture.cpp:127: Failure
Value of: !ts->incrementToken()
Actual: false
Expected: true
[ FAILED ] ChineseTokenizerTest.testNumerics (0 ms)
[ RUN ] ChineseTokenizerTest.testEnglish
[ OK ] ChineseTokenizerTest.testEnglish (0 ms)
[----------] 5 tests from ChineseTokenizerTest (1 ms total)
@Kakueeen ^^
I know the reason why this case failed. If the content was pure numbers, the interface incrementToken would return false before this submission, but now supports pure numbers. Do I need to modify the unit test?
@LocutusOfBorg @alanw
Hello, if you have a patch, please submit it. Right now I had to upload in Debian and Ubuntu without this pull request because it breaks tests...
Description:When I use ChineseAnalyzer for Chinese word segmentation, I find that English and numbers are treated as one word and I think they should be separated.
RootCause:Null
Solution: