validator / htmlparser

The Validator.nu HTML parser https://about.validator.nu/htmlparser/
Other
56 stars 26 forks source link

Investigate failing math-element tree-construction tests #33

Closed sideshowbarker closed 3 years ago

sideshowbarker commented 4 years ago

We’re failing four test cases in https://github.com/html5lib/html5lib-tests/blob/master/tree-construction/math.dat (see below). I don’t understand why the Java parser is failing these but the Firefox parser isn’t.

These seem related to whatwg/html@8cf34ad “Handle foster-parenting of and elements”, which as far as I can see, we maybe never implemented support for in the Java sources — in which case, it would be expected to see the tests failing just as I’ve observed them to be.

But if so, it’s again a mystery to me why these tests would be passing in Firefox but not when run with the Java parser.

Data:
<math><tr><td><mo><tr>
Expected:
| <math math>
|   <math tr>
|     <math td>
|       <math mo>
Got:
| <math math>
|   <math tr>
|     <math td>
|       <math mo>
|   <math tr>
Expected errors:
Actual errors:
6: Start tag “math” seen in “table”.
22: End of file seen and there were open elements.
22: Unclosed element “tr”.
6: Unclosed element “math”.
-----------------------------------------
Data:
<math><thead><mo><tbody>
Expected:
| <math math>
|   <math thead>
|     <math mo>
Got:
| <math math>
|   <math thead>
|     <math mo>
|   <math tbody>
Expected errors:
Actual errors:
6: Start tag “math” seen in “table”.
24: End of file seen and there were open elements.
24: Unclosed element “tbody”.
6: Unclosed element “math”.
-----------------------------------------
Data:
<math><tfoot><mo><tbody>
Expected:
| <math math>
|   <math tfoot>
|     <math mo>
Got:
| <math math>
|   <math tfoot>
|     <math mo>
|   <math tbody>
Expected errors:
Actual errors:
6: Start tag “math” seen in “table”.
24: End of file seen and there were open elements.
24: Unclosed element “tbody”.
6: Unclosed element “math”.
-----------------------------------------
Data:
<math><tbody><mo><tfoot>
Expected:
| <math math>
|   <math tbody>
|     <math mo>
Got:
| <math math>
|   <math tbody>
|     <math mo>
|   <math tfoot>
Expected errors:
Actual errors:
6: Start tag “math” seen in “table”.
24: End of file seen and there were open elements.
24: Unclosed element “tfoot”.
6: Unclosed element “math”.
sideshowbarker commented 4 years ago

I don’t understand why the Java parser is failing these but the Firefox parser isn’t

So it turns out that Firefox is in fact failing these tests; see the following:

The reason I hadn’t previously realized that Firefox fails those is that I wasn’t correctly testing them when I tried to run them manually. I had tried them as normal parsing tests, but they’re actually fragment-parsing tests — so they need to be run through the fragment parser, with a particular element as the context element.

And the reason the failures are going unnoticed as far as Firefox CI is that they’ve been flagged as expected failures:

sideshowbarker commented 4 years ago

Analogous fragment parsing of SVG content exhibits the same bug; see https://github.com/html5lib/html5lib-tests/pull/129

sideshowbarker commented 4 years ago

OK, I think I've managed to isolate the cause of this, and a possible fix. I'll test it more and see. Looking at the part of the code where I think the cause is, it's not clear to me yet how it maps to the current spec requirements. It's a code condition that was added 12 years ago, so I'm also going to look at the history to try to discern where it originally came from.

sideshowbarker commented 4 years ago

OK, I think I've managed to isolate the cause of this,

I was mistaken — the code that I thought I’d isolated it to turned out to not actually be the cause after all.

So I’m back to investigating it further.

sideshowbarker commented 4 years ago

OK, I finally managed to figure out the cause, and #42 has a fix.