nextstrain / augur

Pipeline components for real-time phylodynamic analysis
https://docs.nextstrain.org/projects/augur/
GNU Affero General Public License v3.0
268 stars 128 forks source link

No standard test environment #1557

Open tsibley opened 3 months ago

tsibley commented 3 months ago

Augur lacks a standard test environment across individual developer machines and CI. This results in some test implementations that must use the "lowest common denominator" of programs expected to be ambiently available. For example, we don't use jq because it's (maybe surprisingly?) not available on some dev machines.

https://github.com/nextstrain/augur/blob/6801233ffb1b6ed19ef782a15f8c4e70bd597d4f/tests/functional/refine/cram/year-bounds.t#L40-L44

https://github.com/nextstrain/augur/blob/6801233ffb1b6ed19ef782a15f8c4e70bd597d4f/tests/functional/refine/cram/year-bounds.t#L65

https://github.com/nextstrain/augur/blob/6801233ffb1b6ed19ef782a15f8c4e70bd597d4f/tests/functional/translate/cram/root-mutations.t#L10-L17

https://github.com/nextstrain/augur/blob/6801233ffb1b6ed19ef782a15f8c4e70bd597d4f/tests/functional/titers/cram/titers-sub-with-tree.t#L26

I think in the past we've also accidentally run into differences between the GNU (coreutils) vs. BSD (macOS) core system commands. In our runtimes, we standardize on GNU coreutils to avoid those issues.

The CI environment is currently:

https://github.com/nextstrain/augur/blob/f49a3e47e0109194ca02fe159409f2cfe9131e8c/.github/workflows/ci.yaml#L65-L80

It would be nice to have a standard environment we can rely on in tests. Alternatively, I'd also be happy simply documenting what we expect to be available for tests—I don't think it's much beyond "standard stuff"—and letting people take their own preferred way of getting there.

genehack commented 3 months ago

Perhaps providing a conda dev_env.yaml file would be a step in the right direction?

victorlin commented 3 months ago

Perhaps providing a conda dev_env.yaml file would be a step in the right direction?

Sounds reasonable. How about these instructions?

  1. Set up an environment that contains development dependencies listed in dev_env.yaml. You can have these programs available ambiently or use conda env create -f dev_env.yaml.
  2. Install the local copy of Augur into that environment: python3 -m pip install -e .'[dev]'

dev_env.yaml should be readable enough that it can serve as "simply documenting what we expect to be available for tests", and we could use it in CI to create the environment automatically.

genehack commented 3 months ago

You can have these programs available ambiently or use conda env create -f dev_env.yaml.

I'm probably going to be an outlier in this opinion, but I think it would be better if there was a clear and unambiguous set of directions that results in a defined environment that supports development work on the augur codebase.

If we want to provide an escape hatch before or after those instructions, that's fine, but as a person approaching this person from the outside, wanting to get started working on it with minimal fuss, I do not want options I have to evaluate, I just want directions to follow.

we could use it in CI to create the environment automatically.

Yes, if we do this, I would think we would want to use it ourselves as much as possible.

tsibley commented 3 months ago

If we really want to lock down dev and CI envs to be the same and have a single golden direction to follow, Pixi would help us fit that bill. (I'm not doing this now, though, and am not even convinced we want to.)

tsibley commented 3 months ago

This line will need updating:

https://github.com/nextstrain/augur/blob/f49a3e47e0109194ca02fe159409f2cfe9131e8c/docs/contribute/DEV_DOCS.md#L100

to describe how to install the dev env that solves this issue.

Even currently it needs updating, as there's no need to install a Nextstrain runtime for Augur dev (and that link is very old, but still works thanks to our diligent redirecting).

genehack commented 3 months ago

If we really want to lock down dev and CI envs to be the same and have a single golden direction to follow, Pixi would help us fit that bill. (I'm not doing this now, though, and am not even convinced we want to.)

I'm not sure why we wouldn't use Conda?

victorlin commented 3 months ago

I'm not sure why we wouldn't use Conda?

Pixi provides a comparison page to Conda.

It looks like the main benefit is exact pinning using a pixi.lock lock-file, at least when compared to a loosely defined dev_env.yaml file. Conda doesn't have that built-in but there are additions such as conda-lock. (Note: I've never used Pixi or conda-lock)

We could start with Conda given existing familiarity, then consider additions/alternatives such as conda-lock/Pixi if Conda ends up being insufficient.