umami-hep / umami

Mirror of the gitlab umami project
Apache License 2.0
1 stars 0 forks source link

[JOSS Review]: Python packaging is required #6

Open matthewfeickert opened 5 months ago

matthewfeickert commented 5 months ago

(Continuing from https://github.com/openjournals/joss-reviews/issues/5833#issuecomment-1990949331)

In the Submitting a paper to JOSS docs it is noted that

In addition, JOSS requires that software should be feature-complete (i.e., no half-baked solutions), packaged appropriately according to common community standards for the programming language being used (e.g., Python, R), and designed for maintainable extension (not one-off modifications of existing tools).

which means that for Python packages they should be distributed via a package index (ideally PyPI, or conda-forge if there are complicated external binaries) and are appropriately packaged such that they can be installed with their runtime dependencies. Providing Linux container images in lieu of proper packaging isn't sufficient for JOSS publications (though they are welcome to provide as an addition!). For submissions like umami that are not designed to be more general libraries, the use motivation for distributing on a package index is lower and so won't be strongly recommended, however all JOSS submissions need to be packaged appropriately.

In short, this means that for libraries all the required runtime dependencies need to be included in the packaging metadata with known lower bounds (if you don't want to empirically determine these lower bounds it is okay to select reasonable known working versions) so the installer knows what versions are required. For a Python application, in addition to the library requirements, a requirements.txt or (preferably) a lock file (created by some well established tool like pip-tools, conda-lock, poetry, uv as there is currently no formal Python lock file standard) should also be provided.

(An excellent, and current, overview of packaging is provided by the Scientific Python organization Development Guide and as umami doesn't have compiled extensions the simple packaging section would be relevant.)

From e19017383d1385978d7926b036e284f7602fee30, umami's build backend is setuptools

https://github.com/umami-hep/umami/blob/e19017383d1385978d7926b036e284f7602fee30/setup.py#L9-L11

https://github.com/umami-hep/umami/blob/e19017383d1385978d7926b036e284f7602fee30/pyproject.toml#L1-L2

though all dependency metadata is missing from setup.cfg.

The current umami installation docs note

https://github.com/umami-hep/umami/blob/e19017383d1385978d7926b036e284f7602fee30/docs/setup/installation.md?plain=1#L231-L237

though it is unclear what restrictions exist that would keep tensorflow from being installed into a clean virtual environment.

My recommendation for packaging things properly with the least amount of work is:

  1. Establish from the source code what are the required runtime dependencies and provide them with lower bounds in your packaging metadata. If you wanted to take this as an opportunity to simplify your configuration files and move to just using pyproject.toml this would be a good opportunity, though that is not required, and you can keep setup.cfg if you want. Regardless, please update the build system metadata in pyproject.toml to provide build backend information
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

I would assume that most of these dependencies are already well summarized in

https://github.com/umami-hep/umami/blob/e19017383d1385978d7926b036e284f7602fee30/requirements.txt#L1-L16

though tensorflow is notably absent and so perhaps others are as well.

  1. Update requirements.txt to provide correct application run requirements with deploy/run versions pinned (or optionally produce a lock file).

  2. It seems that images from TensorFlow are used as base images for the Docker container images

https://github.com/umami-hep/umami/blob/e19017383d1385978d7926b036e284f7602fee30/.gitlab/workflow/.docker-gitlab-ci.yaml#L58-L59

https://github.com/umami-hep/umami/blob/e19017383d1385978d7926b036e284f7602fee30/docker/umamibase/Dockerfile#L17-L21

so if having tensorflow in a requirements.txt file is going to cause conflicts here, you can always do a precautionary clean with sed

RUN sed -i '/tensorflow/d' requirements.txt && python -m pip install -r requirements.txt

You might have more clever ideas, but that's a naive suggestion.

  1. Update the installation docs to reflect that the dependencies are provided correctly.

This is a lot of text, but the work required here I think is quite minimal as I believe most of the information already exists. If the authors have questions about any of the above, please ask here. :+1:

matthewfeickert commented 3 months ago

:wave: @philippgadow I'm just checking in on this Issue for https://github.com/openjournals/joss-reviews/issues/5833 not to rush you and the team or be pushy, but to make it clear that I haven't forgotten about this review and abandoned it. If there are questions that I can provide any recommendations or further input on I am happy to do so.

afroch commented 1 month ago

Hi @matthewfeickert. Sorry for the long delay, but I'm currently looking into your comments from above. I adapted already your bullet points 3 and 4 but I have a question concerning your first and second bullet point:

1: When I understand this correctly, the TODO here is to update the pyproject.toml with the little code snippet you provided and add the requirements from the requirements.txt file to the setup.cfg. Is this correct? If so, I saw there is a possibility to do this dynamically. I guess this would be good?

2: Which packages do you talk about when writing "correct application run requirements with deploy/run versions pinned"? From my point, I think we pinned down all the different packages that we use with the version required. Could you clarify what you meant with that?