languagetool-org / languagetool

Style and Grammar Checker for 25+ Languages
https://languagetool.org
GNU Lesser General Public License v2.1
12.42k stars 1.39k forks source link

LT server 'crashes' with StringIndexOutOfBoundsException #3081

Open ghost opened 4 years ago

ghost commented 4 years ago

When there is a cyrillic character in a Dutch text, the server does not return any JSON, but shows a tracktrace. This should not be the case, I guess. I causes the calling app to fail as well.

danielnaber commented 4 years ago

I cannot reproduce, please provide an example input.

ghost commented 4 years ago

Indeed, it does not happen with just any Cyrillic character. It passed in the output far too fast to catch, so I pushed my routine over it. I will add an example when it happens again.

ghost commented 4 years ago

I tested all sentences with cyrillic again, and no crash. It is not worth the trouble testing all fo the data again. The error was from 'cannot test Dutch sentence' code. But let's ignore it until it happens again.

ghost commented 4 years ago

Here is the line, pasted from the screen: Zo heeft Hauff het erbarmelijke Russisch van de Fransman in Der Spieler ontdaan van de gramma­ti­cale fouten die het zo grappig maken: ‘эдак ставка неидет… нет, нет, не мож­но…’[583] is vertaald als ‘solche Einsätze gehen nicht an! Simply paste it in languagetool.org Dutch, and there is an error message. I tried some other languages, but there is no crash there.

danielnaber commented 4 years ago

It's not related to Cyrillic, but to the (invisible) chars in grammaticale:

crash: http://localhost:8081/v2/check?text=de%20gramma%C2%ADti%C2%ADcale%20fouten%20die%20grappig%20maken:%20Eins%C3%A4tze%20an.&language=auto no crash: http://localhost:8081/v2/check?text=de%20grammaticale%20fouten%20die%20grappig%20maken:%20Eins%C3%A4tze%20an.&language=auto

Caused by: java.lang.RuntimeException: Could not check sentence (language: Dutch): 'de grammaticale fouten die het grappig maken: is vertaald solche Einsätze gehen nicht am!'
    at org.languagetool.JLanguageTool$TextCheckCallable.getOtherRuleMatches(JLanguageTool.java:1509)
    at org.languagetool.JLanguageTool$TextCheckCallable.call(JLanguageTool.java:1400)
    at org.languagetool.JLanguageTool$TextCheckCallable.call(JLanguageTool.java:1361)
    at org.languagetool.JLanguageTool.performCheck(JLanguageTool.java:919)
    ... 10 more
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 90
    at java.lang.String.substring(String.java:1963)
    at org.languagetool.rules.AbstractSimpleReplaceRule2.match(AbstractSimpleReplaceRule2.java:234)
    at org.languagetool.JLanguageTool.checkAnalyzedSentence(JLanguageTool.java:965)
    at org.languagetool.JLanguageTool$TextCheckCallable.getOtherRuleMatches(JLanguageTool.java:1466)
    ... 13 more
ghost commented 4 years ago

Ah. That explains it. I just found a text that has no cyrillic and still causes the same crash: zeegroenten spirulina chlorella nori zeewier paddenstoelen agaricus shitake reishi overigen groene theeextract decaf guar gom.