nadeemlab / SPT

Spatial profiling toolbox for spatial characterization of tumor immune microenvironment in multiplex images
https://oncopathtk.org
Other
21 stars 2 forks source link

Test harness simplification #231

Closed jimmymathews closed 2 months ago

jimmymathews commented 1 year ago

The full test suite now takes about 8 minutes, which is long enough to significantly slow down development.

Moreover, despite the "unit test" and "module test" labels, most of the tests are not actually unit tests. In writing tests we have freely availed ourselves of the full complexity of the datasets as presented by a live postgres image in the docker composition. This is great for test quality (production environment is simulated accurately) but these should not need to run frequently and should rarely fail.

For this issue, review all tests and separate into:

  1. Actual unit tests of the Python package, runnable with pytest. All of these together should run in 20 seconds or less.
  2. Full-fledged integration tests.

The integration tests (2) should be migrated to run as GitHub Actions (GA) when a PR is made or when commits are added to a PR. I believe GA supports more or less the entire docker-compose based test harness we currently have running locally on development machines.

Then the expected workflow going forward would be:

  1. Update some code.
  2. Run pytest.
  3. Fix code in response to failures.
  4. When believed to be ready to merge, make a PR.
  5. Integration tests are triggered and results viewable in GitHub UI. Normally, these checks pass.
  6. If checks fail they need to be fixed iteratively and PR becomes effectively a draft.

Resolving this issue will also relieve the Makefiles from a lot of their current responsibility, slimming them down considerably.

jimmymathews commented 2 months ago

Still no CICD and mostly the test suite can be run by maintainer just before merging into main. The worst-offending slow tests are mostly removed, the whole process is closer to 5 minutes now, and tests can be individually cancelled or run in groups of tests related to just one module.