Open adityaraute opened 10 months ago
Thanks for the suggestion, I have myself had a variety of ideas like this in mind since some while based off on recommendations from other repositories. Adding labels to check like [skip tests]
and [skip docs]
to group tests would be worthwhile and is super easy to add via already-supported GitHub Actions syntax, but the issue with this is about how maintainable this can be in the long term, since the file globs to conditionally check are too complex to accommodate all such cases. For example, if I end up modifying a docstring incorrectly in the pybamm/
folder and skip the doctests in a PR as there will be no changes in docs/
, it would mean that failures in the doctests based on that incorrect change may then start to show up elsewhere in future PRs, which would become troublesome for the PR author or maintainers to fix every time. We will need to decide on how to go about this
I suppose skipping the benchmark runs on documentation-only fixes would be fine. It is the longest job and we don't need it to run everywhere, especially if we're not pushing changes to an external repository like we do with a CRON job. I'm aware that SciPy's CI configuration has a lot of such things we can seek inspiration from; for example, they check for the [Build wheels]
message in a commit which proves to be useful for debugging when working on compiled code in a PR – and so on. Feel free to pitch in @arjxn-py @Saransh-cpp. If you have any suggestions w.r.t. this based on what you have observed with PyBaMM's infrastructure, please feel free to list them in this thread, @adityaraute.
There is indeed a lot of redundancy with things like repeated workflows and workflow files (the scheduled tests and regular CI tests could be integrated into the same file, #3518 is still open, benchmarks-related jobs can have less files, etc.)
P.S. The error you are receiving locally with the doctests was fixed recently and came from a missing never mind: I misread and thought this was a test failure on your machinesetuptools
dependency, could you retry with a cleaned .nox
folder and a fresh environment?
I understand the perils of skipping certain checks for certain PRs. But then can it be possible to ensure that checks don't fail without a valid reason? This would mean investigating each workflow and the possibilities where they may be failing erroneously.
I suppose the usual practice is to run tests in such a way that they are as upright as they can be (fail on any deprecation warnings detected or so on) and that they are not flaky (do not fail without reason). We regularly experience benchmarks failures due to the absence of fixed CI runners, one way could be to conduct a statistical analysis by looking at the past data for benchmarks available on https://pybamm.org/benchmarks and adjust the thresholds for failure using the normal distribution obtained so that it is experienced once or twice in a hundred pull request runs. The other way is to improve the benchmark times themselves by improving the solver. Both of these are hard to do, but one thing that can be done is to not fail the workflow but let it emit a "status required" message, i.e., not let it get marked with a ❌ but with a ⚠️
As for other workflows, most of them don't fail at random, except for the link checker which sometimes does so on certain links (we have a lot of them throughout the source code)
Thanks @adityaraute for opening this, this would really save resources and potentially make development faster. I had also been thinking that we should [skip docs] until & unless there's a documentation related change & benchmarks can be skipped in most of the PRs. +1 for @agriyakhetarpal's suggestion, can we maybe also label complexity & priority for this? If this doesn't gets picked up for a while, I'd be happy to work on this after i finish up my pending PRs.
I agree, though let's keep this open for some while first for more discussion. For now, we have these scenarios:
We can create file patterns based on these points, though for some of them it might make sense to do so after #3641 is complete. Please let me know if there are more scenarios that can be looked after.
Might be helpful for detecting changes across a range of commits on a pull request:
@agriyakhetarpal Can we considered this resolved now? There have been a bunch of improvements
Not for now, I would say – the recent improvements have been great (#4125, #4099), but we want to do much more on this in the coming months. @prady0t will be starting out with restructuring some parts of the workflows aside from the pytest
migration goal in the GSoC timeline, and @cringeyburger will help out with the build-system-related changes
@agriyakhetarpal, we might want to have a hypothetical checklist of tasks that one might need to do & a starting point?
I think so – the easiest thing to start would be to implement some patterns in the paths:
and paths-ignore:
fields for the workflows so that we can skip running all the tests for all the PRs. I can add a longer checklist later, that's usually the longer thing to plan out with such issues
A good place for anyone who wants to start with this is this resource: https://learn.scientific-python.org/development/guides/gha-basic/#advanced-usage
Description
I was scrolling through the PRs and noticed most PRs failing some checks or the others; often the same ones.
I've attached a screenshot where a missing package led to a check fail, but there are several similar instances.
Motivation
If checks are failing, and yet they may not be related to the issue at hand, I wonder if it makes the Check or Action itself irrelevant, at least to the majority of PRs.
Possible Implementation
Reduce the number of actions/checks attached to certain workflows, such that only the ones that are essential for a PR are run. Additionally, investigate why certain actions fail repeatedly and rectify any issues in the core repo build.
Additional context
An example of an action running ubuntu tests on PRs