malcolmgreaves / language-detection

Automatically exported from code.google.com/p/language-detection . Some after-the-fact modifications to get this working within sbt.
Apache License 2.0
5 stars 5 forks source link

Detector.append(Reader) throws StringIndexOutOfBoundsException #38

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

Reproduce with following code:

@Test
    public void langDetect(){
        final String textToDetect = "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa my.email.address@email.com asdfadasdf";

        try {
            final URL profiles = Resources.getResource(getClass(), "profiles");
            LangDetector.init(new File(profiles.getPath()));

            final Detector detector = DetectorFactory.create();
            detector.append(new StringReader(textToDetect));

        } catch (LangDetectException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

What is the expected output? What do you see instead?
I expect anything but an exception.
I get this stacktrace: 

java.lang.StringIndexOutOfBoundsException: String index out of range: -1
    at java.lang.String.<init>(String.java:207)
    at com.cybozu.labs.langdetect.Detector.append(Detector.java:154)

What version of the product are you using? On what operating system?
A build from 2011-09-21. JRE 1.6b21

Please provide any additional information below.
I am attempting to use append(Reader) because the URL/Address regex in the 
append(String) will occasionally "freeze" as noted in issues 6 and 26 
(http://code.google.com/p/language-detection/issues/detail?id=26 and 
http://code.google.com/p/language-detection/issues/detail?id=6&q=append). Using 
the reader and buffer alleviates the slowdown by regexes, but is unusable with 
this out of bounds exception.

Original issue reported on code.google.com by Walter.E...@gmail.com on 17 Jul 2012 at 2:59

GoogleCodeExporter commented 9 years ago
I think StringReader.ready always returns true, since it will never block.  The 
append method needs to assign length in the buffered read to avoid this problem:

int len = 0;
while (text.length() < max_text_length && reader.ready() && (length = 
reader.read(buf)) > -1) {
append(new String(buf, 0, length));
}

Original comment by armin...@gmail.com on 17 Sep 2012 at 8:03