ylxb23 / ictclas4j

Automatically exported from code.google.com/p/ictclas4j
0 stars 0 forks source link

tokenizing text causes an endless loop #13

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Tokenizing this text causes an endless loop: 月份牌

Stacktrace:
at org.ictclas4j.segment.NShortPath.getPaths(NShortPath.java:119)
at org.ictclas4j.segment.SegTag.split(SegTag.java:98)
at 
org.languagetool.tokenizers.zh.ChineseWordTokenizer.tokenize(ChineseWordTokenize
r.java:65)
at org.languagetool.JLanguageTool.getRawAnalyzedSentence(JLanguageTool.java:759)
[...]

I copied this report from 
http://sourceforge.net/tracker/?func=detail&aid=3564124&group_id=110216&atid=655
717, as ictclas4j is now maintained again.

Original issue reported on code.google.com by dan80...@gmail.com on 15 Sep 2012 at 9:32

GoogleCodeExporter commented 8 years ago

Original comment by richard.eckart on 3 Oct 2012 at 11:24

Attachments:

GoogleCodeExporter commented 8 years ago
As nobody has stepped forward to take and better action, I'm now going to apply 
this patch from Daniel and will do a Maven release of the project. I'll leave 
the issue open, because the patch does not really fix the issue.

Also, I do not know if the issue is a duplicate of another issue here because 
all the other issues are documented in Chinese. 

Original comment by richard.eckart on 3 Oct 2012 at 11:26

GoogleCodeExporter commented 8 years ago
Applied slightly modified patch. Throws IllegalStateException instead of 
RuntimeException.

Original comment by richard.eckart on 3 Oct 2012 at 12:04