mllam / neural-lam

Neural Weather Prediction for Limited Area Modeling
MIT License
64 stars 24 forks source link

Simplified pip env setup #27

Closed sadamov closed 1 month ago

sadamov commented 1 month ago

This PR simplifies the project setup and updates the requirements file to ensure compatibility and ease of installation. The following changes have been made:

Updated requirements.txt:

Updated README.md:

This should simplify the installation of the environment. The installation is now fully flexible without pinned deps. For papers and such, pinned envs can still be exported of course.

sadamov commented 1 month ago

Better solution proposed here by @leifdenby : https://docs.google.com/document/d/1csgMjyuE1Up2m262e-N7qMPmNJ-zHLguGyrgvnjTuBE/edit handle package dependencies with pdm rather than pip what: put all dependencies in pyproject.toml with pdm and sync pyproject.toml to requirements.txt for pip-based installs why: handling dependencies by manually keeping requirements.txt updated is tedious and errorprone task, requirements.txt don't fully describe a package (version, description) and are therefore no longer the recommended way of defining python packages.

sadamov commented 1 month ago

@leifdenby I added you as a reviewer here, so that you can take what is useful from this PR but then please implement the more robust variant without requirements.txt file mentioned above. You can either use this branch or supercede this PR after your work is complete. I will give you access to the MeteoSwiss fork, since that is where this PR-branch lives.

joeloskarsson commented 1 month ago

I agree that moving dependencies from requirements.txt into pyproject.toml is good. However, I do not understand why one could not just install these using pip? I am not familiar with pdm, but is my understanding correct that this would introduce an additional tool that people have to use for installing requirements? It seems unnecessary to me to introduce additional tools, especially ones that ML practitioners are unlikely to be familiar with.

I agree that requirements handling can be somewhat tedious, but I don't think the project is at a stage that this is a problem yet.

leifdenby commented 1 month ago

I agree that moving dependencies from requirements.txt into pyproject.toml is good. However, I do not understand why one could not just install these using pip? I am not familiar with pdm, but is my understanding correct that this would introduce an additional tool that people have to use for installing requirements? It seems unnecessary to me to introduce additional tools, especially ones that ML practitioners are unlikely to be familiar with.

Yes, I agree adding another tool isn't something we should just do for the sake of it, and I think we can actually make it optional. By this I mean that the package dependencies will be defined in pyproject.toml (rather than requirements.txt) and this could in theory still be updated by hand, and we can then sync this automatically to a requirements.txt-file if needed (this is done by a pre-commit hook). Newer versions of pip can install directly from pyproject.toml, and so using pdm to manage and install dependencies would be optional.

For someone not using pdm the process of setting up a new development environment would be:

  1. Clone repo git clone http://github.com/<username>/neural-lam and change directory cd neural-lam
  2. Optionally create your virtualenv with your favourite tool, maybe mkvirtualenv ... and then activate ...
  3. Install neural-lam editable with pip so that changes you make are reflected in the site-packages version (actually this is simply a link) pip install -e .
  4. Edit and run neural-lam: python -m neural_lam.train_model --config-file config.yaml
  5. Manually install a dependency with pip and edit pyproject.toml by hand, i.e. python -m pip install ... and edit the pyproject.toml file
  6. Stage, commit and push!

I like using a package manager like pdm (other options are pipenv, poetry, uv) because it makes the development process easier by 1) automatically updating package versions in pyproject.toml to ensure versions work together when I add a new package, 2) allowing me to "dry run" any dependency modification without clobbering my current env (this has hurt me quite a few times with pip) and 3) handling creating and activating virtuals envs

So with pdm the development process would be:

  1. Clone repo git clone http://github.com/<username>/neural-lam and change directory cd neural-lam
  2. Create virtual env pdm venv create and either activating it pdm venv activate ... or making it the default when using pdm run python ... with pdm use ...
  3. Install package pdm install
  4. Add dependency pdm add ... (this is where you can do pdm add --dry-run ... to see what would change before you install a package), or remove one with pdm remove .... You can also add dev dependencies separately (which won't be included in wheels) with pdm add --dev ... or dependency groups with pdm add --group visualisation matplotlib for example (if we didn't want visualisation tools installed by default)
  5. Edit and run neural-lam: python -m neural_lam.train_model --config-file config.yaml (if using activated virtualenv) or pdm run python -m neural_lam.train_model --config-file config.yaml
  6. Stage, commit and push!
joeloskarsson commented 1 month ago

Thanks for clarifying and outlining that Leif. I think that all sounds good, and the actual change is just to move dependencies to pyproject.toml. Then users can use whatever tool they want to handle their environment. We could keep sync a requirements.txt as well, but I would not mind just requiring people to have a new enough version of pip to install directly from pyproject.toml.

Will see what I end up doing myself going forward. The pdm workflow seems convenient as well.

joeloskarsson commented 1 month ago

Can we close this if it is fully superseded by #37 ?

sadamov commented 1 month ago

Superceded by #37