Reduce feedback time from the testsuite

jngrad commented 5 months ago

On my 16-core AMD Ryzen workstation, the testsuite takes 40 min to run, which is unreasonable. For comparison, the ESPResSo testsuite takes 3 min to run 196 tests on my workstation, or 5 min when all 15 thermalization and titration tests are included.

I'm afraid contributors will be unwilling to run the pyMBE testsuite after each commit due to how long it takes for it to complete. That will lead to contributors getting their inbox flooded with automated e-mails about failing tests, which can be demotivating. Reviewers will also have a harder time verifying code contributions, since the PR will have a lot of small commits to fix regressions introduced in larger commits, thus increasing noise.

We could separate the testsuite into unit tests that provides fast feedback and acceptance tests that verify the reproducibility of published papers. The unit tests should run on every PR. The acceptance tests should run once in a month on the main branch, with the possibility of manually triggering the job via e.g. workflow_dispatch. The latter is important, as contributors or reviewers may want to occasionally check if changes to acceptance tests did not break something by manually triggering the job. Likewise, every time a software release is made, maintainers should manually trigger the acceptance tests to show the release passes all tests.

Acceptance tests might fail once in a while on the main branch, when a regression is accidentally merged on the main branch. Maintainers should react to such a failing pipeline within the next 30 days. When unit tests pass, and the code covered by unit tests is large, then failing acceptance tests are an indication that something extremely subtle is broken. This type of issue requires a long investigation that may exceed the time budget or skill level of most contributors. The community should be allowed to continue opening PRs and getting them merged while experts investigate the broken acceptance tests.

For illustration purposes, historically in ESPResSo, acceptance tests broke due to correlated RNG noise in the Langevin thermalization code (2 afternoons and 3 core developers to debug it), rejected Monte Carlo moves that didn't restore the original particle velocities (1 core developer and 2 external Monte Carlo experts to debug it), incorrect unit conversion in the LB particle coupling code (2 months and 2 cores developers to solve it).

pm-blanco commented 5 months ago

@jngrad the workflow for the Ci testing that you suggest sounds very reasonable to me, we certainly need to reduce the CI running time to get faster response times and splitting it in this way would be a good way of fixing it

jngrad commented 5 months ago

Offline discussion with @kosovan: it is an open question, whether acceptance tests should run once every n days, or once at merge time. The latter ensures the main branch is always bug-free, to the best of our knowledge. From a technical point of view, this can be achieved via GitHub merge queues: when merging a PR, a temporary branch is created containing the merge commit. Acceptance tests are automatically triggered on that branch via the on: merge_group event. When CI is successful, the branch gets fast-forwarded on the main branch. I don't have a strong opinion on which one of these two strategies should be adopted in pyMBE.

Here are the pros and cons:

monthly builds:
- pros:
  - reduces CI resources consumption
  - other PRs can still be merged when a regression that doesn't affect unit tests is present on the main branch
- cons:
  - core team must have the discipline to address failing CI on the main branch in a timely manner
merge queue:
- pros:
  - regressions are never introduced on the main branch
  - responsibility of fixing regressions falls upon the PR contributor(s)
- cons:
  - PRs in the merge queue may be merged out-of-order (if PR 1 fails CI, PR 2 still gets merged if it passes CI)
  - there is a long delay between clicking the merge button and the main branch receiving the merge commit

ESPResSo uses nightly builds with more thorough tests in an effort to both reduce CI power usage, and not penalize new contributors who may not have the necessary expertise to address CI errors due to floating-point accuracy issues, missing ESPResSo features, or failing sanity checks on the Javascript and HTML code generated by Jupyter Notebooks. This is especially useful during coding days, as we need to merge a lot of PRs in a short amount of time to keep our momentum. To make sure the core team deals with failing nightly builds in a timely fashion, ESPResSo has a bot that opens a new issue every day until the core team fixes the main branch. Whether pyMBE should use nightly vs weekly vs monthly builds is debatable. I suggested monthly because it aligns with our current monthly dev meetings.

pm-blanco commented 5 months ago

@jngrad @kosovan:

As for now, the "monthly builds" or maybe weekly builds sounds a better option to me. I can imagine that the merge queue is going the end up being a quite mess once we start merging PR in different order.
It is true that the "monthly builds" option might result in temporary buggy main branches of pyMBE. However, I think we should start adopting the strategy that is now used in Espresso: have stable release versions and the main branch for development. This also would have the advantage that our users can use a release version of pyMBE with a constant API (which we are changing nowadays way too often).
In any case, I would leave the CI as it is for now and think about it a bit more. We should discuss it on our next pyMBE-dev meeting together with the rest of the team

pm-blanco commented 4 months ago

Outcome from online discussion of the pyMBE-dev team, we have bi-weekly builds of the CI for the interacting tests.

pyMBE-dev / pyMBE

Reduce feedback time from the testsuite #56