html5lib / html5lib-tests

Testsuite data for html5lib, including the de-facto standard HTML parsing tests.
MIT License
188 stars 61 forks source link

Implement tokenization errors as per spec. #92

Closed inikulin closed 7 years ago

inikulin commented 7 years ago

Ready for a review, but it makes sense to merge spec changes first. Note that for now we've decided to move new error codes to a separate section in tree construction stage tests to not mix things up. Once we have a spec for tree construction stage errors, we'll remove old errors and move new errors to #errors section.

Spec PR: https://github.com/whatwg/html/pull/2701

inikulin commented 7 years ago

Spec PR has been merged. This one is ready for review.

inikulin commented 7 years ago

Ping @gsnedders

inikulin commented 7 years ago

@gsnedders

Apart from the one issue, what's going on with the tree construction tests? Some of the #errors sections have been partially updated for the tokenizer errors, in addition to the #new-errors sections?

Unfortunately, seems like some error codes were erroneously added to the legacy error section and we missed it on review. I'll do a cleanup.

Update: I've figured out a cause of this issue: some error codes that didn't get into the final cut were the same as legacy error codes. During automatic rename they were replaced with new codes.

Personally, I have a slight preference against adding a new section temporarily, given I know some things rely on the specific headings

I thought it would be an unobtrusive way for a transition to the new parse errors. The other options are:

Do you have any other ideas?

gsnedders commented 7 years ago

@inikulin

Do you have any other ideas?

Not really. :) Adding a new section probably makes as much sense as anything else.

inikulin commented 7 years ago

@gsnedders Fixed

stevecheckoway commented 6 years ago

I haven't actually checked they match the spec, because that would take forever, and I'd rather just wait until some implementation notices they don't match.

For what it's worth, the code I'm testing, Nokogumbo, outputs each #new-error error message in exactly the same order as the tests.

(It also tests that the line numbers match and the column numbers in the test are at least as large as the column numbers Nokogumbo outputs. I can't test for column number equality because Nokogumbo outputs column numbers at a place that would be most useful for humans, not the column where the error was detected.)