web-platform-tests / wpt

Test suites for Web platform specs — including WHATWG, W3C, and others
https://web-platform-tests.org/
Other
4.94k stars 3.08k forks source link

Split out flake8 jobs from pytest jobs #14893

Open foolip opened 5 years ago

foolip commented 5 years ago

https://github.com/web-platform-tests/wpt/pull/14852#issuecomment-454789830 suggests that finding the flake8 results is hard, which I have also found. They are now run as part of "tools/ unittests (Python 2)" and "tools/ unittests (Python 3)" on Travis.

foolip commented 3 years ago

@jgraham would you be OK with this, even if it means another task on Taskcluster? Given the overhead of each task I presume fewer jobs are better, especially for very short-running tasks.

jgraham commented 3 years ago

I don't really object as such, but I wonder if we could start by improving the output so it's easier to find things in the logs.

foolip commented 3 years ago

The first thing that comes to mind is to not use tox but just invoke things directly. Tox adds a bunch of things in the output that makes it harder to find the good stuff.

Also if Taskcluster has some notion of steps within a task that could help, then pytest could be one step, and flake8 another.

jgraham commented 3 years ago

Taskcluster doesn't have any built-in notion of steps within a task. Of course we can produce whatever artifacts we like, including ones that will be displayed in the GH UI. I'd suggest the most obvious thing to do here would be to write a post processor that will extract the failures and display them in the GH summary rather than making people look at the logs in the common case.

foolip commented 3 years ago

@jgraham what would you think of putting it in the lint job instead? I think it would be helpful if it's immediately clear that it's not a test failure, but rather something likely trivial. At least I would have a higher bar for even opening a likely test failure to figure out what's wrong, putting it off for longer.

jgraham commented 3 years ago

I mean sure, we can rearrange it like that. It might be better but I doubt it's going to solve the underlying problem (approximately: having to read actual logs causes a step change in the difficulties people experience when having to fix a job failure).