pyMBE-dev / pyMBE

pyMBE provides tools to facilitate building up molecules with complex architectures in the Molecular Dynamics software ESPResSo. For an up-to-date API documention please check our website:
https://pymbe-dev.github.io/pyMBE/pyMBE.html
GNU General Public License v3.0
6 stars 8 forks source link

Reduce feedback time from the testsuite #56

Closed jngrad closed 4 months ago

jngrad commented 5 months ago

On my 16-core AMD Ryzen workstation, the testsuite takes 40 min to run, which is unreasonable. For comparison, the ESPResSo testsuite takes 3 min to run 196 tests on my workstation, or 5 min when all 15 thermalization and titration tests are included.

I'm afraid contributors will be unwilling to run the pyMBE testsuite after each commit due to how long it takes for it to complete. That will lead to contributors getting their inbox flooded with automated e-mails about failing tests, which can be demotivating. Reviewers will also have a harder time verifying code contributions, since the PR will have a lot of small commits to fix regressions introduced in larger commits, thus increasing noise.

We could separate the testsuite into unit tests that provides fast feedback and acceptance tests that verify the reproducibility of published papers. The unit tests should run on every PR. The acceptance tests should run once in a month on the main branch, with the possibility of manually triggering the job via e.g. workflow_dispatch. The latter is important, as contributors or reviewers may want to occasionally check if changes to acceptance tests did not break something by manually triggering the job. Likewise, every time a software release is made, maintainers should manually trigger the acceptance tests to show the release passes all tests.

Acceptance tests might fail once in a while on the main branch, when a regression is accidentally merged on the main branch. Maintainers should react to such a failing pipeline within the next 30 days. When unit tests pass, and the code covered by unit tests is large, then failing acceptance tests are an indication that something extremely subtle is broken. This type of issue requires a long investigation that may exceed the time budget or skill level of most contributors. The community should be allowed to continue opening PRs and getting them merged while experts investigate the broken acceptance tests.

For illustration purposes, historically in ESPResSo, acceptance tests broke due to correlated RNG noise in the Langevin thermalization code (2 afternoons and 3 core developers to debug it), rejected Monte Carlo moves that didn't restore the original particle velocities (1 core developer and 2 external Monte Carlo experts to debug it), incorrect unit conversion in the LB particle coupling code (2 months and 2 cores developers to solve it).

pm-blanco commented 5 months ago

@jngrad the workflow for the Ci testing that you suggest sounds very reasonable to me, we certainly need to reduce the CI running time to get faster response times and splitting it in this way would be a good way of fixing it

jngrad commented 5 months ago

Offline discussion with @kosovan: it is an open question, whether acceptance tests should run once every n days, or once at merge time. The latter ensures the main branch is always bug-free, to the best of our knowledge. From a technical point of view, this can be achieved via GitHub merge queues: when merging a PR, a temporary branch is created containing the merge commit. Acceptance tests are automatically triggered on that branch via the on: merge_group event. When CI is successful, the branch gets fast-forwarded on the main branch. I don't have a strong opinion on which one of these two strategies should be adopted in pyMBE.

Here are the pros and cons:

ESPResSo uses nightly builds with more thorough tests in an effort to both reduce CI power usage, and not penalize new contributors who may not have the necessary expertise to address CI errors due to floating-point accuracy issues, missing ESPResSo features, or failing sanity checks on the Javascript and HTML code generated by Jupyter Notebooks. This is especially useful during coding days, as we need to merge a lot of PRs in a short amount of time to keep our momentum. To make sure the core team deals with failing nightly builds in a timely fashion, ESPResSo has a bot that opens a new issue every day until the core team fixes the main branch. Whether pyMBE should use nightly vs weekly vs monthly builds is debatable. I suggested monthly because it aligns with our current monthly dev meetings.

pm-blanco commented 5 months ago

@jngrad @kosovan:

pm-blanco commented 4 months ago

Outcome from online discussion of the pyMBE-dev team, we have bi-weekly builds of the CI for the interacting tests.