Open foolip opened 5 years ago
@jgraham would you be OK with this, even if it means another task on Taskcluster? Given the overhead of each task I presume fewer jobs are better, especially for very short-running tasks.
I don't really object as such, but I wonder if we could start by improving the output so it's easier to find things in the logs.
The first thing that comes to mind is to not use tox but just invoke things directly. Tox adds a bunch of things in the output that makes it harder to find the good stuff.
Also if Taskcluster has some notion of steps within a task that could help, then pytest could be one step, and flake8 another.
Taskcluster doesn't have any built-in notion of steps within a task. Of course we can produce whatever artifacts we like, including ones that will be displayed in the GH UI. I'd suggest the most obvious thing to do here would be to write a post processor that will extract the failures and display them in the GH summary rather than making people look at the logs in the common case.
@jgraham what would you think of putting it in the lint job instead? I think it would be helpful if it's immediately clear that it's not a test failure, but rather something likely trivial. At least I would have a higher bar for even opening a likely test failure to figure out what's wrong, putting it off for longer.
I mean sure, we can rearrange it like that. It might be better but I doubt it's going to solve the underlying problem (approximately: having to read actual logs causes a step change in the difficulties people experience when having to fix a job failure).
https://github.com/web-platform-tests/wpt/pull/14852#issuecomment-454789830 suggests that finding the flake8 results is hard, which I have also found. They are now run as part of "tools/ unittests (Python 2)" and "tools/ unittests (Python 3)" on Travis.