oblac / jodd

Jodd! Lightweight. Java. Zero dependencies. Use what you like.
https://jodd.org
BSD 2-Clause "Simplified" License
4.06k stars 723 forks source link

java.lang.IndexOutOfBoundsException #763

Closed marekhudyma closed 4 years ago

marekhudyma commented 4 years ago

Current behavior

I got java.lang.IndexOutOfBoundsException while parsing http://cnn.com page:

I use:

    <dependency>
      <groupId>org.jodd</groupId>
      <artifactId>jodd-lagarto</artifactId>
      <version>5.1.4</version>
    </dependency>

I use Java 11.

Exception that I have:

Exception in thread "main" java.lang.IndexOutOfBoundsException
    at jodd.util.CharArraySequence.<init>(CharArraySequence.java:69)
    at jodd.util.CharArraySequence.of(CharArraySequence.java:47)
    at jodd.lagarto.Scanner.charSequence(Scanner.java:142)
    at jodd.lagarto.LagartoParser.emitComment(LagartoParser.java:2977)
    at jodd.lagarto.LagartoParser$28.parse(LagartoParser.java:1411)
    at jodd.lagarto.LagartoParser.parse(LagartoParser.java:135)

Expected behavior

No exception.

Steps to Reproduce the Problem

My code:

        LagartoParser lagartoParser = new LagartoParser(html.toString());
        TagVisitorImpl tagVisitor = new TagVisitorImpl();
        lagartoParser.parse(tagVisitor);

        class TagVisitorImpl implements TagVisitor {
        @Override
        public void tag(Tag tag) {
            href = tag.getAttributeValue("href");
            if (href != null) {
                // ... 
            }
        }

Here I uploaded file which causes error, so it would be easy to reproduce:

https://www.dropbox.com/s/soqzjz2d7p33soo/0126_cnn.com.html?dl=0
slandelle commented 4 years ago

This is a bug with Jodd's IE conditional comments support:

<html><head>
<!--[if lte IE 9]><meta http-equiv="refresh" content="1;url=/2.218.0/static/unsupp.html" /><![endif]-->
<!--[if gt IE 9><!--><script>alert("Hello!");</script><!--<![endif]-->
</head>
</html>

The first line works fine, the second one crashes with IOOBE. A possible workaround is to disable IE conditional comments support in Lagarto config.

igr commented 4 years ago

Thanx @slandelle for fast detection! It seems that the second line has the syntax error for Conditional Comments, i.e. its missing the ]. I guess that in this case this is not a conditional comment, but regular one. Jodd should 'roll back' the conditional condition parsing and go with comment (just need to verify what happens in browser... internet explorer, anyone?:))).

slandelle commented 4 years ago

Let me check if I didn’t introduce a typo when simplifying cnn.com HTML content.

igr commented 4 years ago

I checked the code, you didnt make mistake @slandelle

igr commented 4 years ago

if I assume that [if gt IE 9> is a regular comment, should Jodd emit conditional comment on endif or regular comment (as the starting conditional comment does not exist)?

slandelle commented 4 years ago

Regular comment IMHO

slandelle commented 4 years ago

Thanks mate!

igr commented 4 years ago

Releasing new version @slandelle atm :)