validator / htmlparser

The Validator.nu HTML parser https://about.validator.nu/htmlparser/
Other
56 stars 26 forks source link

HTML parser throws AssertionErrors when -enableassertions is set #12

Open veita opened 4 years ago

veita commented 4 years ago

HTML parser throws AssertionErrors when -enableassertions is set. This makes it difficult to use the validator in test environments where assertions are often enabled.

Caused by: java.lang.AssertionError: strBufLen not reset after previous use!
    at nu.validator.htmlparser.impl.Tokenizer.clearStrBufBeforeUse(Tokenizer.java:852)
    at nu.validator.htmlparser.impl.Tokenizer.stateLoop(Tokenizer.java:1561)
    at nu.validator.htmlparser.impl.Tokenizer.tokenizeBuffer(Tokenizer.java:1341)
    at nu.validator.htmlparser.io.Driver.runStates(Driver.java:320)
    at nu.validator.htmlparser.io.Driver.tokenize(Driver.java:219)
    at nu.validator.htmlparser.sax.HtmlParser.tokenize(HtmlParser.java:488)
    at nu.validator.htmlparser.sax.HtmlParser.parse(HtmlParser.java:408)
    at nu.validator.xml.WiretapXMLReaderWrapper.parse(WiretapXMLReaderWrapper.java:158)
    at nu.validator.validation.SimpleDocumentValidator.checkAsHTML(SimpleDocumentValidator.java:523)
    at nu.validator.validation.SimpleDocumentValidator.checkHtmlInputSource(SimpleDocumentValidator.java:405)
carlosame commented 4 years ago

I'm persistently getting the strBufLen not reset after previous use errors in my integration testing with versions from 1.4 to 1.4.16 (haven't tested previous), and for example the document currently at https://vk.com/htmlstrap can be used to reproduce it. Could this be fixed?

Here is the 1.4.16 stack trace:

java.lang.AssertionError: strBufLen not reset after previous use!
    at htmlparser@1.4.16/nu.validator.htmlparser.impl.Tokenizer.clearStrBufBeforeUse(Tokenizer.java:940)
    at htmlparser@1.4.16/nu.validator.htmlparser.impl.Tokenizer.stateLoop(Tokenizer.java:3847)
    at htmlparser@1.4.16/nu.validator.htmlparser.impl.Tokenizer.tokenizeBuffer(Tokenizer.java:1429)
    at htmlparser@1.4.16/nu.validator.htmlparser.io.Driver.runStates(Driver.java:320)
    at htmlparser@1.4.16/nu.validator.htmlparser.io.Driver.tokenize(Driver.java:219)
    at htmlparser@1.4.16/nu.validator.htmlparser.dom.HtmlDocumentBuilder.tokenize(HtmlDocumentBuilder.java:263)
    at htmlparser@1.4.16/nu.validator.htmlparser.dom.HtmlDocumentBuilder.parse(HtmlDocumentBuilder.java:312)

Should I file it as a separate bug?

carlosame commented 4 years ago

When I download the previously mentioned URL with Firefox, it is giving me a different document so I upload the one that I use in my tests: vk_com.html.gz