microbiomedata / nmdc-runtime

Runtime system for NMDC data management and orchestration
https://microbiomedata.github.io/nmdc-runtime/
Other
7 stars 3 forks source link

Replace custom build system with Poetry #563

Open eecavanna opened 5 months ago

eecavanna commented 5 months ago

Background

Currently, the Runtime uses a custom dependency management solution that works something like this:

  1. Update a requirements/*.in file: main.in for production dependencies and dev.in for development dependencies
  2. Run the command $ make update-deps, which — under the hood — runs these commands, which update the main.txt and dev.txt files https://github.com/microbiomedata/nmdc-runtime/blob/f0e809646f20b1c635c2ba1324f3294d4dd4fb0b/Makefile#L6-L16
  3. Run the command $ make init, which — under the hood — runs these commands, which install the packages listed in main.txt and dev.txt into the current Python environment

Pros:

Cons:

Proposal

Switch to using Poetry.

If the Runtime used Poetry, the above process would become:

  1. Update the pyproject.toml file (there are different sections of the file for production versus development dependencies)
  2. Run $ poetry install to install the packages listed in the file — this will generate a poetry.lock file if one doesn't already exist (commit this file to the repo once it exists)
dwinston commented 5 months ago

Sure, let's pursue this. The main wrinkle is reproducible Docker builds, which may just mean adding a

poetry export -f requirements.txt -o requirements.txt

step (https://pythonspeed.com/articles/pipenv-docker/)?

some more context that informed my initial decision wrt pip-tools:

eecavanna commented 5 months ago

OK, thanks.

I was under the impression having the poetry.lock file in the repo (and not having or using a requirements.txt file anywhere, at any time) would suffice, as long as dependencies are installed via poetry install. The docs for poetry install (source) say:

The install command reads the pyproject.toml file from the current project, resolves the dependencies, and installs them.

If there is a poetry.lock file in the current directory, it will use the exact versions from there instead of resolving them. This ensures that everyone using the library will get the same versions of the dependencies.

Based on that, I don't think a requirements.txt file is necessary for reproducibility, specifically.

I briefly skimmed the first link in your message and saw that the introduction of a requirements.txt file may be to enable us to benefit more from cacheing in order to speed up container image builds. I am in favor of speeding up container image builds from their current durations.

eecavanna commented 13 hours ago

I'll take a crack at this tonight. I don't think I'll finish tonight, but I think I'll at least come away with more knowledge about what will be involved with making this switch.

eecavanna commented 12 hours ago

I've run into my first hiccup: incompatible ("contradictory") requirements (you can ignore the yellow line about the broken pre-existing environment).

image

I don't know whether this also happens with the current build system and I just haven't noticed it, or it is unique to either Poetry or the version specifiers I ended up with in the pyproject.toml file, which are:

mkdocs-mermaid2-plugin = "^1.2.1"
# ...
nmdc-schema = "==11.1.0"

Anyway, I'll try to resolve it.

I see that nmdc-schema (in its own project.toml file) has declared mkdocs-mermaid2-plugin as one of its production dependencies. I think that imposes that constraint on every project that uses nmdc-schema.

eecavanna commented 12 hours ago

I resolved that one by updating the version constraint to: ">=0.6.0, <0.7.0"

# Note: The specification for the latest version of this package is (currently) `^1.2.1`. The reason we
#       are specifying an older version is that we also depend on `nmdc-schema==11.1.0`, which specifies
#       that older version as one of its production dependencies.
#
# TODO: Inquire with `nmdc-schema` maintainers about addressing their dependence upon the
#       old version of this package.
#
mkdocs-mermaid2-plugin = ">=0.6.0, <0.7.0"

For reference, the requirements/main.txt file (from the old build system) says:

mkdocs-mermaid2-plugin==0.6.0
    # via
    #   -r requirements/main.in
    #   nmdc-schema

However, after making that change, when I re-ran $ poetry install, I got a new, unrelated conflict:

image

There may be additional conflicts beyond this second one. I do not plan to post about each additional on in a comment here. I am noting them in comments in the pyproject.toml file (alongside the many TODO comments there).

eecavanna commented 12 hours ago

Yahoo! There were only 1-2 additional conflicts and they are all resolved now. Poetry has generated its poetry.lock file.

eecavanna commented 12 hours ago

I created a draft PR (linked above by GitHub) containing the pyproject.toml and poetry.lock files. I think there's still a ways to go here (e.g. updating Dockerfiles, updating the Makefile, updating documentation).

eecavanna commented 8 hours ago

The PR is continuing to take shape. All the GitHub Actions workflows are passing again (with Poetry in the mix and the requirements directory gone). I'll look at this some more next week.