Closed sadamov closed 1 month ago
Better solution proposed here by @leifdenby : https://docs.google.com/document/d/1csgMjyuE1Up2m262e-N7qMPmNJ-zHLguGyrgvnjTuBE/edit handle package dependencies with pdm rather than pip what: put all dependencies in pyproject.toml with pdm and sync pyproject.toml to requirements.txt for pip-based installs why: handling dependencies by manually keeping requirements.txt updated is tedious and errorprone task, requirements.txt don't fully describe a package (version, description) and are therefore no longer the recommended way of defining python packages.
@leifdenby I added you as a reviewer here, so that you can take what is useful from this PR but then please implement the more robust variant without requirements.txt file mentioned above. You can either use this branch or supercede this PR after your work is complete. I will give you access to the MeteoSwiss fork, since that is where this PR-branch lives.
I agree that moving dependencies from requirements.txt
into pyproject.toml
is good. However, I do not understand why one could not just install these using pip? I am not familiar with pdm, but is my understanding correct that this would introduce an additional tool that people have to use for installing requirements? It seems unnecessary to me to introduce additional tools, especially ones that ML practitioners are unlikely to be familiar with.
I agree that requirements handling can be somewhat tedious, but I don't think the project is at a stage that this is a problem yet.
I agree that moving dependencies from
requirements.txt
intopyproject.toml
is good. However, I do not understand why one could not just install these using pip? I am not familiar with pdm, but is my understanding correct that this would introduce an additional tool that people have to use for installing requirements? It seems unnecessary to me to introduce additional tools, especially ones that ML practitioners are unlikely to be familiar with.
Yes, I agree adding another tool isn't something we should just do for the sake of it, and I think we can actually make it optional. By this I mean that the package dependencies will be defined in pyproject.toml
(rather than requirements.txt
) and this could in theory still be updated by hand, and we can then sync this automatically to a requirements.txt
-file if needed (this is done by a pre-commit hook). Newer versions of pip
can install directly from pyproject.toml
, and so using pdm
to manage and install dependencies would be optional.
For someone not using pdm
the process of setting up a new development environment would be:
git clone http://github.com/<username>/neural-lam
and change directory cd neural-lam
mkvirtualenv ...
and then activate ...
neural-lam
editable with pip so that changes you make are reflected in the site-packages
version (actually this is simply a link) pip install -e .
neural-lam
: python -m neural_lam.train_model --config-file config.yaml
pip
and edit pyproject.toml
by hand, i.e. python -m pip install ...
and edit the pyproject.toml
fileI like using a package manager like pdm (other options are pipenv, poetry, uv) because it makes the development process easier by 1) automatically updating package versions in pyproject.toml
to ensure versions work together when I add a new package, 2) allowing me to "dry run" any dependency modification without clobbering my current env (this has hurt me quite a few times with pip) and 3) handling creating and activating virtuals envs
So with pdm
the development process would be:
git clone http://github.com/<username>/neural-lam
and change directory cd neural-lam
pdm venv create
and either activating it pdm venv activate ...
or making it the default when using pdm run python ...
with pdm use ...
pdm install
pdm add ...
(this is where you can do pdm add --dry-run ...
to see what would change before you install a package), or remove one with pdm remove ...
. You can also add dev
dependencies separately (which won't be included in wheels) with pdm add --dev ...
or dependency groups with pdm add --group visualisation matplotlib
for example (if we didn't want visualisation tools installed by default)neural-lam
: python -m neural_lam.train_model --config-file config.yaml
(if using activated virtualenv) or pdm run python -m neural_lam.train_model --config-file config.yaml
Thanks for clarifying and outlining that Leif. I think that all sounds good, and the actual change is just to move dependencies to pyproject.toml
. Then users can use whatever tool they want to handle their environment. We could keep sync a requirements.txt
as well, but I would not mind just requiring people to have a new enough version of pip to install directly from pyproject.toml
.
Will see what I end up doing myself going forward. The pdm
workflow seems convenient as well.
Can we close this if it is fully superseded by #37 ?
Superceded by #37
This PR simplifies the project setup and updates the requirements file to ensure compatibility and ease of installation. The following changes have been made:
Updated requirements.txt:
Updated README.md:
This should simplify the installation of the environment. The installation is now fully flexible without pinned deps. For papers and such, pinned envs can still be exported of course.