web-platform-tests / wpt

Test suites for Web platform specs — including WHATWG, W3C, and others
https://web-platform-tests.org/
Other
5k stars 3.09k forks source link

Make wpt own the HTML parser test data and remove dependency on html5lib-python, html5lib-tests #27868

Open zcorpan opened 3 years ago

zcorpan commented 3 years ago

This week I've done the exercise of updating HTML parser tests again, though this time I was a bit more successful in figuring out how to get those changes through to wpt (see #2887). But boy is it painful and also mostly undocumented!

Juggling 3 repos for one change like this doesn't seem ideal for contributors. From wpt's perspective, what I would like instead is:

Then html5lib-python can get the tree-builder test data from wpt instead of from html5lib-tests.

Thoughts? @gsnedders @jgraham @annevk @stephenmcgruer

gsnedders commented 3 years ago

this is effectively a dupe of https://github.com/html5lib/html5lib-tests/issues/127 fwiw

zcorpan commented 3 years ago

@gsnedders oh, right, I had forgotten about that! It seems like there isn't objection. Are you still planning to work on this?

gsnedders commented 3 years ago

@gsnedders oh, right, I had forgotten about that! It seems like there isn't objection. Are you still planning to work on this?

It is a long way down my list.

zcorpan commented 2 years ago

A tweak we can make is to depend on html5lib-tests instead of html5lib-python from wpt, which would remove the second step. (I think this was @jgraham 's idea, but don't see it mentioned in GitHub.)

gsnedders commented 2 years ago

One obvious (easy) tweak given it's using git-submodules is to explicitly store a commit hash somewhere in WPT and then during update cd html5lib-python/html5lib/tests/testdata && git fetch origin && git checkout $REV.

hsivonen commented 2 years ago

My main concern is that I want to preserve the file format for the preferred form form making modifications to the test, since there are non-WPT consumers of those formats.

I'm not a fan of WPT having a build step that transforms the tree builder test format. FWIW, Gecko's mochitest harness stores the original .dat format in the repo and parses it when the tests are run.

zcorpan commented 2 years ago

Having the sources files in the same format in wpt and parsing them with JS when running sounds ideal actually. Can that parser be migrated to wpt?

annevk commented 1 year ago

Having worked on a parser bug in WebKit I now think this would be even more valuable than I previously thought. It looks like Chromium and WebKit both have two sets of parser tests in the tree:

And the former has tests the latter might not contain. I contributed further to this problem in https://github.com/WebKit/WebKit/pull/12019, but am willing to be part of the cleanup crew if we make web-platform-tests the true home of HTML parser tests.

I suspect @mfreed7 might be interested in this from the Chromium side. Copying here to gather interest.

mfreed7 commented 1 year ago

I'm definitely supportive of the effort to clean this up, and make WPT the source of truth for parser tests.

annevk commented 1 year ago

Steps taken thus far:

I wonder if @zcorpan is still interested in taking this even further as I think it would definitely be preferable if we didn't have to go via html5lib-tests.

https://github.com/html5lib/html5lib-tests does have a number of actionable issues and stale PRs worth triaging. Help appreciated.

zcorpan commented 1 year ago

Yes. See https://github.com/html5lib/html5lib-tests/issues/127#issuecomment-1490501826 and later comments.

annevk commented 1 year ago

@zcorpan any progress on this?

zcorpan commented 1 year ago

Not yet but it's on my list.