microbiomedata / nmdc-runtime

Runtime system for NMDC data management and orchestration
https://microbiomedata.github.io/nmdc-runtime/
Other
4 stars 3 forks source link

Replace custom build system with Poetry #563

Open eecavanna opened 1 week ago

eecavanna commented 1 week ago

Background

Currently, the Runtime uses a custom dependency management solution that works something like this:

  1. Update a requirements/*.in file: main.in for production dependencies and dev.in for development dependencies
  2. Run the command $ make update-deps, which — under the hood — runs these commands, which update the main.txt and dev.txt files https://github.com/microbiomedata/nmdc-runtime/blob/f0e809646f20b1c635c2ba1324f3294d4dd4fb0b/Makefile#L6-L16
  3. Run the command $ make init, which — under the hood — runs these commands, which install the packages listed in main.txt and dev.txt into the current Python environment

Pros:

Cons:

Proposal

Switch to using Poetry.

If the Runtime used Poetry, the above process would become:

  1. Update the pyproject.toml file (there are different sections of the file for production versus development dependencies)
  2. Run $ poetry install to install the packages listed in the file — this will generate a poetry.lock file if one doesn't already exist (commit this file to the repo once it exists)
dwinston commented 1 week ago

Sure, let's pursue this. The main wrinkle is reproducible Docker builds, which may just mean adding a

poetry export -f requirements.txt -o requirements.txt

step (https://pythonspeed.com/articles/pipenv-docker/)?

some more context that informed my initial decision wrt pip-tools:

eecavanna commented 1 week ago

OK, thanks.

I was under the impression having the poetry.lock file in the repo (and not having or using a requirements.txt file anywhere, at any time) would suffice, as long as dependencies are installed via poetry install. The docs for poetry install (source) say:

The install command reads the pyproject.toml file from the current project, resolves the dependencies, and installs them.

If there is a poetry.lock file in the current directory, it will use the exact versions from there instead of resolving them. This ensures that everyone using the library will get the same versions of the dependencies.

Based on that, I don't think a requirements.txt file is necessary for reproducibility, specifically.

I briefly skimmed the first link in your message and saw that the introduction of a requirements.txt file may be to enable us to benefit more from cacheing in order to speed up container image builds. I am in favor of speeding up container image builds from their current durations.