Too many failing checks / implement smarter CI

adityaraute commented 10 months ago

Description

I was scrolling through the PRs and noticed most PRs failing some checks or the others; often the same ones.

I've attached a screenshot where a missing package led to a check fail, but there are several similar instances.

Motivation

If checks are failing, and yet they may not be related to the issue at hand, I wonder if it makes the Check or Action itself irrelevant, at least to the majority of PRs.

Possible Implementation

Reduce the number of actions/checks attached to certain workflows, such that only the ones that are essential for a PR are run. Additionally, investigate why certain actions fail repeatedly and rectify any issues in the core repo build.

Additional context

An example of an action running ubuntu tests on PRs

agriyakhetarpal commented 10 months ago

Thanks for the suggestion, I have myself had a variety of ideas like this in mind since some while based off on recommendations from other repositories. Adding labels to check like [skip tests] and [skip docs] to group tests would be worthwhile and is super easy to add via already-supported GitHub Actions syntax, but the issue with this is about how maintainable this can be in the long term, since the file globs to conditionally check are too complex to accommodate all such cases. For example, if I end up modifying a docstring incorrectly in the pybamm/ folder and skip the doctests in a PR as there will be no changes in docs/, it would mean that failures in the doctests based on that incorrect change may then start to show up elsewhere in future PRs, which would become troublesome for the PR author or maintainers to fix every time. We will need to decide on how to go about this

I suppose skipping the benchmark runs on documentation-only fixes would be fine. It is the longest job and we don't need it to run everywhere, especially if we're not pushing changes to an external repository like we do with a CRON job. I'm aware that SciPy's CI configuration has a lot of such things we can seek inspiration from; for example, they check for the [Build wheels] message in a commit which proves to be useful for debugging when working on compiled code in a PR – and so on. Feel free to pitch in @arjxn-py @Saransh-cpp. If you have any suggestions w.r.t. this based on what you have observed with PyBaMM's infrastructure, please feel free to list them in this thread, @adityaraute.

There is indeed a lot of redundancy with things like repeated workflows and workflow files (the scheduled tests and regular CI tests could be integrated into the same file, #3518 is still open, benchmarks-related jobs can have less files, etc.)

P.S. ~~The error you are receiving locally with the doctests was fixed recently and came from a missing setuptools dependency, could you retry with a cleaned .nox folder and a fresh environment?~~ never mind: I misread and thought this was a test failure on your machine

adityaraute commented 10 months ago

I understand the perils of skipping certain checks for certain PRs. But then can it be possible to ensure that checks don't fail without a valid reason? This would mean investigating each workflow and the possibilities where they may be failing erroneously.

agriyakhetarpal commented 10 months ago

I suppose the usual practice is to run tests in such a way that they are as upright as they can be (fail on any deprecation warnings detected or so on) and that they are not flaky (do not fail without reason). We regularly experience benchmarks failures due to the absence of fixed CI runners, one way could be to conduct a statistical analysis by looking at the past data for benchmarks available on https://pybamm.org/benchmarks and adjust the thresholds for failure using the normal distribution obtained so that it is experienced once or twice in a hundred pull request runs. The other way is to improve the benchmark times themselves by improving the solver. Both of these are hard to do, but one thing that can be done is to not fail the workflow but let it emit a "status required" message, i.e., not let it get marked with a ❌ but with a ⚠️

As for other workflows, most of them don't fail at random, except for the link checker which sometimes does so on certain links (we have a lot of them throughout the source code)

arjxn-py commented 10 months ago

Thanks @adityaraute for opening this, this would really save resources and potentially make development faster. I had also been thinking that we should [skip docs] until & unless there's a documentation related change & benchmarks can be skipped in most of the PRs. +1 for @agriyakhetarpal's suggestion, can we maybe also label complexity & priority for this? If this doesn't gets picked up for a while, I'd be happy to work on this after i finish up my pending PRs.

agriyakhetarpal commented 10 months ago

I agree, though let's keep this open for some while first for more discussion. For now, we have these scenarios:

Skip benchmarks + coverage + unit + integration on docs-related changes
Skip unit + coverage + integration on changes to examples or example scripts?
Don't skip anything if workflow files are modified (we will have to look into this in depth a bit, defeats the purpose)
Add markers like [Build wheels] to trigger wheel builds on a commit that has this string, say if someone is working on compiled code for the IDAKLU solver

We can create file patterns based on these points, though for some of them it might make sense to do so after #3641 is complete. Please let me know if there are more scenarios that can be looked after.

agriyakhetarpal commented 10 months ago

Might be helpful for detecting changes across a range of commits on a pull request:

kratman commented 5 months ago

@agriyakhetarpal Can we considered this resolved now? There have been a bunch of improvements

agriyakhetarpal commented 5 months ago

Not for now, I would say – the recent improvements have been great (#4125, #4099), but we want to do much more on this in the coming months. @prady0t will be starting out with restructuring some parts of the workflows aside from the pytest migration goal in the GSoC timeline, and @cringeyburger will help out with the build-system-related changes

arjxn-py commented 3 months ago

@agriyakhetarpal, we might want to have a hypothetical checklist of tasks that one might need to do & a starting point?

agriyakhetarpal commented 3 months ago

I think so – the easiest thing to start would be to implement some patterns in the paths: and paths-ignore: fields for the workflows so that we can skip running all the tests for all the PRs. I can add a longer checklist later, that's usually the longer thing to plan out with such issues

agriyakhetarpal commented 3 months ago

A good place for anyone who wants to start with this is this resource: https://learn.scientific-python.org/development/guides/gha-basic/#advanced-usage

pybamm-team / PyBaMM