tinkerwell / jodd

Automatically exported from code.google.com/p/jodd
0 stars 0 forks source link

Jerry can not parse Spring api html file #27

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
it will Throw Exception the Line: Jerry doc = 
Jerry.jerry(FileUtil.readString(file));

Exception:

Exception in thread "main" jodd.lagarto.LagartoException: Illegal character []] 
(state: 4)
    at jodd.lagarto.Lexer.nextToken(Lexer.java:1423)
    at jodd.lagarto.LagartoLexer.nextToken(LagartoLexer.java:1)
    at jodd.lagarto.LagartoParserEngine.nextToken(LagartoParserEngine.java:651)
    at jodd.lagarto.LagartoParserEngine.skipWhiteSpace(LagartoParserEngine.java:665)
    at jodd.lagarto.LagartoParserEngine.parseAttribute(LagartoParserEngine.java:562)
    at jodd.lagarto.LagartoParserEngine.parseTagAndAttributes(LagartoParserEngine.java:477)
    at jodd.lagarto.LagartoParserEngine.parseTag(LagartoParserEngine.java:419)
    at jodd.lagarto.LagartoParserEngine.parse(LagartoParserEngine.java:165)
    at jodd.lagarto.LagartoParserEngine.parse(LagartoParserEngine.java:120)
    at jodd.lagarto.dom.LagartoDOMBuilder.doParse(LagartoDOMBuilder.java:218)
    at jodd.lagarto.dom.LagartoDOMBuilder.parse(LagartoDOMBuilder.java:201)
    at jodd.lagarto.dom.jerry.Jerry$JerryParser.parse(Jerry.java:106)
    at jodd.lagarto.dom.jerry.Jerry.jerry(Jerry.java:53)
    at wjw.test.jodd.wot.HtmlParseTest3.main(HtmlParseTest3.java:16)
//============================
test code:

import java.io.File;
import java.io.IOException;

import jodd.io.FileUtil;
import jodd.lagarto.dom.jerry.Jerry;
import jodd.lagarto.dom.jerry.JerryFunction;

public class HtmlParseTest3 {
  public static void main(String[] args) throws IOException {
    File file = new File("test-data/Validator.html");

    long startTime = System.currentTimeMillis();
    // create Jerry, i.e. document context
    Jerry doc = Jerry.jerry(FileUtil.readString(file));
    System.out.println("use time:" + ((System.currentTimeMillis() - startTime) / 1000));

    //parse
    doc.$("a").each(new JerryFunction() {
      public boolean onNode(Jerry $this, int index) {
        System.out.println("-----");
        System.out.println($this.html());
        System.out.println($this.get()[0].getHtml());
        return false;
      }
    });
  }

}

Original issue reported on code.google.com by wjw465...@gmail.com on 21 Sep 2012 at 7:23

Attachments:

GoogleCodeExporter commented 9 years ago
Found total 3 files:

spring-framework-3\docs\javadoc-api\org\springframework\validation\Validator.htm
l error:jodd.lagarto.LagartoException: Illegal character []] (state: 4)
spring-framework-3\docs\javadoc-api\org\springframework\web\portlet\mvc\Abstract
FormController.html error:jodd.lagarto.LagartoException: Illegal character []] 
(state: 4)
spring-framework-3\docs\javadoc-api\org\springframework\web\portlet\util\Portlet
Utils.html error:jodd.lagarto.LagartoException: Illegal character []] (state: 4)

Original comment by wjw465...@gmail.com on 21 Sep 2012 at 7:42

Attachments:

GoogleCodeExporter commented 9 years ago
I found the problem: lexer.flex,it Can not handle"pre,code"tag,I add to 
lexer.flex and test is ok.

Original comment by wjw465...@gmail.com on 21 Sep 2012 at 9:14

Attachments:

GoogleCodeExporter commented 9 years ago
Perfect, will take a look as soon we migrate! Thank you!!!

Original comment by i...@jodd.org on 21 Sep 2012 at 10:31

GoogleCodeExporter commented 9 years ago
I found Also need to change LagartoParserEngine.parseSpecialTag() method, add 
handle "pre/code" statements.

code Fragment:
protected void parseSpecialTag(int state) throws IOException {
        int start = lexer.position() + 1;
        nextToken();
        int end = start + lexer.length();
        switch(state) {
            case Lexer.XMP:
                visitor.xmp(tag, input.subSequence(start, end - 6));
                break;
            case Lexer.SCRIPT:
                visitor.script(tag, input.subSequence(start, end - 9));
                break;
            case Lexer.STYLE:
                visitor.style(tag, input.subSequence(start, end - 8));
                break;
        }
    }

Original comment by wjw465...@gmail.com on 21 Sep 2012 at 1:18

GoogleCodeExporter commented 9 years ago

Original comment by i...@jodd.org on 27 Sep 2012 at 7:36

GoogleCodeExporter commented 9 years ago
see:
https://github.com/oblac/jodd/pull/1
https://github.com/oblac/jodd/pull/2

Original comment by i...@jodd.org on 27 Sep 2012 at 9:06

GoogleCodeExporter commented 9 years ago
fixed on github.

Original comment by i...@jodd.org on 27 Sep 2012 at 9:38