html5lib / html5lib-tests

Testsuite data for html5lib, including the de-facto standard HTML parsing tests.
MIT License
188 stars 58 forks source link

Add list of parsers that keep up with this testsuite in CI #141

Open untitaker opened 2 years ago

untitaker commented 2 years ago

I want to add some tokenizer testcases, and so far have been field-testing them on html5lib-python, but would like to know if there are any parsers out there that would break on those testcases. Particularly because html5lib-python's test harness doesn't seem to compare error locations, and I'm about to add some tests that are all about error locations.

It would be nice to start collecting parsers that use any part of this testsuite in CI, in some sort of document, e.g. README. That would solve my usecase and also make it easier to discover conformant parsers for various languages.

Another idea would be to add all of those parsers to this repo's CI, but that sounds like too much effort

untitaker commented 2 years ago

https://sourcegraph.com/search?q=context:global+html5lib-tests+file:.gitmodules&patternType=literal

JKingweb commented 1 year ago

For what it's worth my parser does test error locations, and includes patches for tokenizer tests because some of the error locations seem wrong but there doesn't seem to be any interest in ironing that out.

I don't use CI, though, so nothing would break immediately. I also don't use submodules, so it's not included in that Sourcegraph search.

flavorjones commented 1 year ago

👋 Nokogiri (https://github.com/sparklemotion/nokogiri), a Ruby library that includes HTML5 parsing, uses this test suite. We do regular upstream builds against this repository's master branch, so we would catch any breakage in a day or two.

I'd be happy to do the legwork to add Nokogiri's test suite in a "downstream" CI job on this repository if that would be useful.

sideshowbarker commented 1 year ago

👋 Nokogiri (sparklemotion/nokogiri), a Ruby library that includes HTML5 parsing, uses this test suite. We do regular upstream builds against this repository's master branch, so we would catch any breakage in a day or two.

I'd be happy to do the legwork to add Nokogiri's test suite in a "downstream" CI job on this repository if that would be useful.

That sounds great. If you can provide the details about what would be necessary upstream here in this repo, I can work with you to get the necessary changes landed here.

Assuming we’d want to work from a PR: I’m a member of the html5lib org — though not full admin or maintainer perms — and I have perms to approve and merge PRs (write/push perms).

But if any changes to the settings for the repo are needed — in contrast to what we could get done in a PR — then at that point we’d need to have somebody with admin/owner perms make any necessary settings changes.

untitaker commented 1 year ago

Is this something we want to constrain to a handful of parsers? I am more than happy to also add a job for my parser html5gum but it pales in relevance compared to nokogiri.

I would add all of those CI jobs as non-required, because parsers can indeed be sometimes broken and updating the testsuite shouldn't be blocked on fixing them. As such I don't think repo settings changes are required.

sideshowbarker commented 1 year ago

Is this something we want to constrain to a handful of parsers?

As far as I understand it, No, we have no reason to constrain it to just a particular/small number of parsers.

I am more than happy to also add a job for my parser html5gum but it pales in relevance compared to nokogiri.

I think the relevance shouldn’t be a factor in deciding. It seems to me the only factors in deciding should be: whether the parser project has somebody willing to contribute the code for running CI for their parser — and that CI for their parser is passing rather than failing — and whether they can commit to continuing to keeping that up to date.

But even in the worst case if somebody no longer has time to maintain their CI contribution here — and their CI is failing rather than passing — then we just yank it out here and quit running it.

But in general, in my opinion at least, the more, the merrier (running more parsers through the test suite here under CI is better for html5lib than running fewer).

sideshowbarker commented 1 year ago

For what it's worth my parser does test error locations, and includes patches for tokenizer tests because some of the error locations seem wrong but there doesn't seem to be any interest in ironing that out.

I don't use CI, though, so nothing would break immediately. I also don't use submodules, so it's not included in that Sourcegraph search.

If you don’t have existing CI set up but could help write something to run under a GitHub Actions workflow here, than that would be great.

untitaker commented 1 year ago

I have a PR here to add my parser: https://github.com/html5lib/html5lib-tests/pull/152

for some reason it doesn't show any status

it does run as part of GHA within my own fork: https://github.com/untitaker/html5lib-tests/actions/runs/4386180990/jobs/7679878268

sideshowbarker commented 1 year ago

I have a PR here to add my parser: #152

for some reason it doesn't show any status

it does run as part of GHA within my own fork: untitaker/html5lib-tests/actions/runs/4386180990/jobs/7679878268

I guess that’s most likely due to some repository setting here. I’ll try to get it figured out.

flavorjones commented 1 year ago

@sideshowbarker Github Actions won't run a new workflow unless that workflow file already exists on the default branch.

Let me ship a PR with an skeleton "downstream" workflow file that you can review and merge, and then @untitaker and I (and others) can modify that workflow in subsequent PRs? I think I can do that tonight.

flavorjones commented 1 year ago

I've created #153 as that skeleton workflow file.

sideshowbarker commented 1 year ago

@sideshowbarker Github Actions won't run a new workflow unless that workflow file already exists on the default branch.

Aha, that makes sense

Let me ship a PR with an skeleton "downstream" workflow file that you can review and merge, and then @untitaker and I (and others) can modify that workflow in subsequent PRs?

I've created #153 as that skeleton workflow file.

Super — thanks, now merged

untitaker commented 1 year ago

Thank you for setting this up, I have updated https://github.com/html5lib/html5lib-tests/pull/152 to look like #154

fb55 commented 1 year ago

Awesome idea! I've opened a PR for parse5 at #155.

untitaker commented 1 year ago

hello @sideshowbarker, any update on this? I think the three above PRs are pretty much ready to go

flavorjones commented 1 year ago

@untitaker I wanted to check if you were still considering merging the downstream test suites at #152, #154, and #155? Do you have any concerns that we can address?

untitaker commented 1 year ago

I have no permissions on this repo, I am waiting for somebody else to merge them

On Sat, May 20, 2023, at 23:31, Mike Dalessio wrote:

@untitaker https://github.com/untitaker I wanted to check if you were still considering merging the downstream test suites at #152 https://github.com/html5lib/html5lib-tests/pull/152, #154 https://github.com/html5lib/html5lib-tests/pull/154, and #155 https://github.com/html5lib/html5lib-tests/pull/155? Do you have any concerns that we can address?

— Reply to this email directly, view it on GitHub https://github.com/html5lib/html5lib-tests/issues/141#issuecomment-1556016243, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGMPRLZYF6L4IEAGPTOHFLXHEZ3PANCNFSM5MC6MAMQ. You are receiving this because you were mentioned.Message ID: @.***>

untitaker commented 1 year ago

ping @gsnedders @sideshowbarker again, can somebody make a decision on whether we want to merge above PRs for CI?

flavorjones commented 1 year ago

@untitaker Sorry about that -- I meant to ping @sideshowbarker, sorry for the notification noise.

flavorjones commented 1 year ago

@sideshowbarker Are you still open merging the downstream test suites at #152, #154, and #155? Do you have any concerns that we can address?

flavorjones commented 1 year ago

@sideshowbarker Are you still open to merging #152, #154, and #155? Do you have any concerns with this approach, or those PRs, that we should address?

sideshowbarker commented 1 year ago

I have no objections to merging them and I’ve now approved them all — but the problem continues to be that every PR has a stalled “lint Expected — Waiting for status to be reported” check, and I don’t have sufficient perms in this repo to force-merge the PRs without the check being green.

untitaker commented 1 year ago

I've merged master into my branch, that should fix it. I think maintainers can merge master into arbitrary PRs to resolve this too

fb55 commented 1 year ago

I've updated #155 as well. Happy to see this move forward!

flavorjones commented 1 year ago

Thank you!