Open eecavanna opened 5 months ago
Sure, let's pursue this. The main wrinkle is reproducible Docker builds, which may just mean adding a
poetry export -f requirements.txt -o requirements.txt
step (https://pythonspeed.com/articles/pipenv-docker/)?
some more context that informed my initial decision wrt pip-tools
:
OK, thanks.
I was under the impression having the poetry.lock
file in the repo (and not having or using a requirements.txt
file anywhere, at any time) would suffice, as long as dependencies are installed via poetry install
. The docs for poetry install
(source) say:
The
install
command reads thepyproject.toml
file from the current project, resolves the dependencies, and installs them.If there is a
poetry.lock
file in the current directory, it will use the exact versions from there instead of resolving them. This ensures that everyone using the library will get the same versions of the dependencies.
Based on that, I don't think a requirements.txt
file is necessary for reproducibility, specifically.
I briefly skimmed the first link in your message and saw that the introduction of a requirements.txt
file may be to enable us to benefit more from cacheing in order to speed up container image builds. I am in favor of speeding up container image builds from their current durations.
I'll take a crack at this tonight. I don't think I'll finish tonight, but I think I'll at least come away with more knowledge about what will be involved with making this switch.
I've run into my first hiccup: incompatible ("contradictory") requirements (you can ignore the yellow line about the broken pre-existing environment).
I don't know whether this also happens with the current build system and I just haven't noticed it, or it is unique to either Poetry or the version specifiers I ended up with in the pyproject.toml
file, which are:
mkdocs-mermaid2-plugin = "^1.2.1"
# ...
nmdc-schema = "==11.1.0"
Anyway, I'll try to resolve it.
I see that nmdc-schema
(in its own project.toml file) has declared mkdocs-mermaid2-plugin
as one of its production dependencies. I think that imposes that constraint on every project that uses nmdc-schema
.
I resolved that one by updating the version constraint to: ">=0.6.0, <0.7.0"
# Note: The specification for the latest version of this package is (currently) `^1.2.1`. The reason we
# are specifying an older version is that we also depend on `nmdc-schema==11.1.0`, which specifies
# that older version as one of its production dependencies.
#
# TODO: Inquire with `nmdc-schema` maintainers about addressing their dependence upon the
# old version of this package.
#
mkdocs-mermaid2-plugin = ">=0.6.0, <0.7.0"
For reference, the requirements/main.txt
file (from the old build system) says:
mkdocs-mermaid2-plugin==0.6.0
# via
# -r requirements/main.in
# nmdc-schema
However, after making that change, when I re-ran $ poetry install
, I got a new, unrelated conflict:
There may be additional conflicts beyond this second one. I do not plan to post about each additional on in a comment here. I am noting them in comments in the pyproject.toml
file (alongside the many TODO
comments there).
Yahoo! There were only 1-2 additional conflicts and they are all resolved now. Poetry has generated its poetry.lock
file.
I created a draft PR (linked above by GitHub) containing the pyproject.toml
and poetry.lock
files. I think there's still a ways to go here (e.g. updating Dockerfiles, updating the Makefile, updating documentation).
The PR is continuing to take shape. All the GitHub Actions workflows are passing again (with Poetry in the mix and the requirements
directory gone). I'll look at this some more next week.
Background
Currently, the Runtime uses a custom dependency management solution that works something like this:
requirements/*.in
file:main.in
for production dependencies anddev.in
for development dependencies$ make update-deps
, which — under the hood — runs these commands, which update themain.txt
anddev.txt
files https://github.com/microbiomedata/nmdc-runtime/blob/f0e809646f20b1c635c2ba1324f3294d4dd4fb0b/Makefile#L6-L16$ make init
, which — under the hood — runs these commands, which install the packages listed inmain.txt
anddev.txt
into the current Python environmentPros:
Cons:
Proposal
Switch to using Poetry.
If the Runtime used Poetry, the above process would become:
pyproject.toml
file (there are different sections of the file for production versus development dependencies)$ poetry install
to install the packages listed in the file — this will generate apoetry.lock
file if one doesn't already exist (commit this file to the repo once it exists)