html5lib / html5lib-tests

Testsuite data for html5lib, including the de-facto standard HTML parsing tests.
MIT License
188 stars 61 forks source link

Logic of eof-in-comment error positions #125

Open JKingweb opened 4 years ago

JKingweb commented 4 years ago

I'm finishing up testing of a tokenizer implementation, which I've tried to make pass the test suite in its entirety, including line and column positions for parse errors.

While supplementary-plane characters counting for two columns has some logic to it, I find myself struggling to understand the logic behind the positions in the tests added by @hsivonen:

https://github.com/html5lib/html5lib-tests/pull/121/files

Other EOF errors (and indeed, other eof-in-comment errors) use the position of the EOF itself, while these use a position one character behind. The position is also on line 1 rather than line 2 despite there being a line break just before.

Is this simply an oversight?