hankcs / hanlp-lucene-plugin

HanLP中文分词Lucene插件,支持包括Solr在内的基于Lucene的系统
http://www.hankcs.com/nlp/segment/full-text-retrieval-solr-integrated-hanlp-chinese-word-segmentation.html
Apache License 2.0
296 stars 99 forks source link

运行com/hankcs/lucene/HighLighterTest.java会导致Exception #29

Closed kevindragon closed 6 years ago

kevindragon commented 6 years ago

运行com/hankcs/lucene/HighLighterTest.java会导致

org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token   exceeds length of provided text sized 10
    at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:231)
    at org.apache.lucene.search.highlight.Highlighter.getBestFragments(Highlighter.java:161)
    at org.apache.lucene.search.highlight.Highlighter.getBestFragment(Highlighter.java:107)
    at org.apache.lucene.search.highlight.Highlighter.getBestFragment(Highlighter.java:85)
    at com.hankcs.lucene.HighLighterTest.displayHtmlHighlight(HighLighterTest.java:168)
    at com.hankcs.lucene.HighLighterTest.testHightlight(HighLighterTest.java:100)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at junit.framework.TestCase.runTest(TestCase.java:176)
    at junit.framework.TestCase.runBare(TestCase.java:141)
    at junit.framework.TestResult$1.protect(TestResult.java:122)
    at junit.framework.TestResult.runProtected(TestResult.java:142)
    at junit.framework.TestResult.run(TestResult.java:125)
    at junit.framework.TestCase.run(TestCase.java:129)
    at junit.framework.TestSuite.runTest(TestSuite.java:255)
    at junit.framework.TestSuite.run(TestSuite.java:250)
    at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
    at org.junit.runner.JUnitCore.run(JUnitCore.java:160)
    at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
    at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
    at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
    at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)

似乎是这句https://github.com/hankcs/hanlp-lucene-plugin/pull/28/files#diff-6580de02396bcf4c1d775178482af676R113 导致的,根据文档

  /**
   * This method is called by a consumer before it begins consumption using
   * {@link #incrementToken()}.
   * <p>
   * Resets this stream to a clean state. Stateful implementations must implement
   * this method so that they can be reused, just as if they had been created fresh.
   * <p>
   * If you override this method, always call {@code super.reset()}, otherwise
   * some internal state will not be correctly reset (e.g., {@link Tokenizer} will
   * throw {@link IllegalStateException} on further usage).
   */
  public void reset() throws IOException {}

不应该储存offset。

kevindragon commented 6 years ago

fixed in issue #30