html5lib / html5lib-tests

Testsuite data for html5lib, including the de-facto standard HTML parsing tests.
MIT License
188 stars 58 forks source link

Add testcase for open attribute value #140

Closed untitaker closed 2 years ago

untitaker commented 2 years ago

This is a curious testcase because html5ever appears to be failing it. At least piping it to html5ever like so appears to produce:

$ echo -n -e "<D/0=&\r0='>" | cargo run --example tokenize
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/debug/examples/tokenize`
ERROR: Bad character
TAG  : <d 0='&0=''>
OTHER: EOFToken

Tokenizer profile, in nanoseconds

       23885         total in token sink

       44610         total in tokenizer
       27019  60.6%  AttributeValue(Unquoted)
        5158  11.6%  Data
        4669  10.5%  TagOpen
        2174   4.9%  TagName
        1753   3.9%  SelfClosingStartTag
        1413   3.2%  BeforeAttributeName
        1382   3.1%  BeforeAttributeValue
        1042   2.3%  AttributeName

I could not find an older revision of the spec (in w3 or whatwg) that would explain this behavior.

It's late and I'm tired, but I believe my reading of the spec is correct and this should not emit a tag.

untitaker commented 2 years ago

Forgot to mention, I'm running afl to find inconsistencies between html5gum and html5ever now. This is why I decided to create a separate file, to not clutter existing files as I add more to this. Do you think this belongs in one of the existing files?

untitaker commented 2 years ago

I believe this testcase may add nothing new to the testsuite. I assumed it would be non-overlapping with existing testdata because it exhibits flaws in html5ever, however html5ever is lagging 4 years behind testsuite's main branch, see https://github.com/servo/html5ever/issues/459

Not sure what to do with this or how to best check that this testcase adds something new.

untitaker commented 2 years ago

Nevermind, I think this should be covered already.