icesat2py / icepyx

Python tools for obtaining and working with ICESat-2 data
https://icepyx.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
207 stars 107 forks source link

Move `requirements*.txt` files into `pyproject.toml` #552

Open mfisher87 opened 1 month ago

mfisher87 commented 1 month ago

I think it's more conventional to express these as extras groups in the project metadata. Then they can be installed as desired and in combination, e.g. with pip install --editable .[dev,docs]

https://github.com/scientific-python/cookie/blob/0dd0032a08231ed246caccfd6be4b7e82466fe51/pyproject.toml#L38-L51

weiji14 commented 1 month ago

Once upon a time, there was a good reason to keep the requirements.txt file in the top level folder because that was what GitHub used to create the 'Dependency graph' according to https://docs.github.com/en/code-security/supply-chain-security/understanding-your-software-supply-chain/about-the-dependency-graph#supported-package-ecosystems.

image

Nowadays, I think that's not true (at least I can see dependents on my repo that doesn't have a requirements.txt file), so we could probably remove 'requirements.txt' if needed. But happy with just removing requirements-dev.txt and requirements-docs.txt only (will need to update the docs and CI scripts relying on it).

mfisher87 commented 1 month ago

I'm surprised the doc wasn't updated!

JessicaS11 commented 1 month ago

Then they can be installed as desired and in combination, e.g. with pip install --editable .[dev,docs]

Can you also install this way into a conda environment? Maybe I'll understand better after Don and Romina's packaging tutorial, but this has historically been a topic that feels extra challenging to me (which I interpret to mean it's likely even more challenging for other less technically-interested researchers). I can easily create a conda environment with multiple requirements files. My recollection is I cannot do that if the requirements are all only in the metadata.

we could probably remove 'requirements.txt' if needed.

We'd need to then add an environment file for users (and update the docs accordingly).

mfisher87 commented 1 month ago

Oh my goodness, yes, this is challenging for me as well. I want to be grateful for Conda Forge for a moment, it's been so good for how I think about packaging! :rock: :star:

Can you also install this way into a conda environment?

Yes, if the conda environment has pip installed! How are you currently installing requirements.txt into a conda environment?

My personal opinion: this project is a pure python library which does not depend on conda packages, but we do expect conda users to be both contributors and users of this package, as well as possibly non-conda users. For the purposes of this discusison, I think we can ignore users -- sorry, still love you :heart: but we serve them separately by packaging for conda-forge.

I think we can best serve this audience of contributors by doing things in a way that's most portable to and from other projects, which may look like two steps: (1) Create a new environment with your environment manager. We'll show an example of how to do this with conda. (2) Install the package using pip into the new environment.

# 1
conda create icepyx-dev pip
conda activate icepyx-dev
# 2
pip install --editable .[dev,docs]

What do you think?

JessicaS11 commented 4 weeks ago

Yes, if the conda environment has pip installed! How are you currently installing requirements.txt into a conda environment?

This is a deal breaker for me. I want to be able to create my environment with all potential dependencies and only have to pip install icepyx -e (emphasis on the editable part). I've had enough headaches to last my lifetime with pip installing too many packages after creating a skeleton new conda environment. I'm not 100% closed off on this idea, but I need to do a lot more reading and learning before I'm comfortable going ahead with it.

JessicaS11 commented 4 weeks ago

I'm not sure what changes you're requesting here or what the dealbreaker is

I'm saying I'm not comfortable with the PR title: "Move requirements*.txt files into pyproject.toml"

mfisher87 commented 4 weeks ago

I'm still not sure I understand, I'm sorry. The PR is titled "Move dependencies from requirements files into pyproject.toml, add extra groups for dev, docs". What would you prefer for the title?

JessicaS11 commented 3 weeks ago

I'm saying I'm not comfortable with the PR title: "Move requirements*.txt files into pyproject.toml"

I see that this isn't necessarily any clearer - sorry about that. It's not the actual title, but the task described by the title. Literally, I'm not comfortable with "mov[ing] requirements*.txt files into pyproject.toml".

mfisher87 commented 3 weeks ago

Thanks for clarifying, that makes sense to me. It sounds like you need more info, but aren't ruling this out, correct?

Under that assumption, I'd suggest checking out:

weiji14 commented 3 weeks ago

Not to take sides, but I think I get @JessicaS11's concern that the shift here means that we will be mixing conda-forge dependencies with pip dependencies. The current development environment setup uses this command:

https://github.com/icesat2py/icepyx/blob/6c6366aa004f4f33821348efff815a9d35650565/doc/source/contributing/how_to_contribute.rst?plain=1#L95

What this command does, is actually install the dependencies listed in requirements*.txt from conda-forge. We could just install Python via conda-forge and the other dependencies via PyPI, but keeping this mixed conda/pip environment intact requires a level of discipline, and if you accidentally run conda install xxx in that environment, things will break. The best case is still to get everything from conda-forge and let the conda solver handle dependency conflicts rather than trusting that conda/pip's resolvers play nicely with each other.

mfisher87 commented 3 weeks ago

First: All of the below are my opinions, and if I'm not convincing anyone, then no problem!

What this command does, is actually install the dependencies listed in requirements*.txt from conda-forge.

True (well, partly... the command doesn't actually work, but I see what the intent is)! This is part of my concern -- as this is a pure Python package, the conda-forge distribution is downstream of our build target PyPI. I feel it's unusual and unintuitive that the development stage would occur with dependencies from a distribution downstream of our build target.

We could just install Python via conda-forge and the other dependencies via PyPI, but keeping this mixed conda/pip environment intact requires a level of discipline, and if you accidentally run conda install xxx in that environment, things will break.

Agreed, there are downsides to this approach, but personally I feel less than the current approach. I think it's OK to use conda to create environments and use pip to install into them if you know to use conda as an environment manager, not a dependency manager.

IMO the best approach overall is to not primarily document for contributors using conda to avoid this confusion. If we want conda contributor documentation, I think we should treat it as an exception ("if you want to use conda..."), and clarify that distinction between dependency management and environment management. Our documentation already mixes conda and pip commands without warning of the possible problems, so in my view this would be an improvement! (There is no way to editable install with conda itself, they dropped support for conda develop like half a decade ago (I dunno lol), it was calling out to pip anyway).

The best case is still to get everything from conda-forge and let the conda solver handle dependency conflicts rather than trusting that conda/pip's resolvers play nicely with each other.

Under the assumption that all contributors will use conda, I agree, but I think it's important to make contributing accessible to people who use a more conventional pure Python development toolchain instead of conda. Of course, without disadvantaging contributors who want to use conda.

I'd be happy to update the docs on this PR if it helps to "show, don't tell" what I'm proposing.

JessicaS11 commented 1 week ago

I've done some reading and, though I cannot find any good arguments for WHY one should do dependency resolution with pip over conda (even for a pure python package), I can see the logic for having optional dependencies all contained within pyproject.toml instead of separate files. So while I am not enthusiastically in favor of this transition, I am okay with moving forward on it.

Under the assumption that all contributors will use conda, I agree, but I think it's important to make contributing accessible to people who use a more conventional pure Python development toolchain instead of conda. Of course, without disadvantaging contributors who want to use conda.

My personal perspective here is that a pure Python developer is more likely to be comfortable navigating across pip + conda (and can always pip install -f requirements.txt -f requirements-dev.txt), while a contributor using conda is less likely to have an intuition for using (e.g.) pip install icepyx[dev] or understanding why their mixed pip + conda environment isn't working. Since there's never going to be a solution that makes everyone happy, the best we can do is make sure our docs are very clear that those contributing may need to install their dev environment differently than they might otherwise (if they're just using conda). A short list of what we'll need to mention includes (this is as much a mental note for myself as anything else):

mfisher87 commented 1 week ago

So while I am not enthusiastically in favor of this transition, I am okay with moving forward on it.

I don't want to push through on this if you don't feel it's in the project's best interest. I may have strong opinions, but I'm OK with holding onto them weakly :)

I cannot find any good arguments for WHY one should do dependency resolution with pip over conda (even for a pure python package)

I don't know if it's a "good" argument or not :) but my thinking on this is almost entirely about reducing complexity. When installing a dev environment from a downstream distribution (conda-forge) of our actual target (PyPI), it's possible for unexpected differences in those distributions to make supporting build errors or users of the PyPI distribution more complex. It's just easier for me personally to reason about

flowchart LR

  PyPI --> dev
  dev --> PyPI
  PyPI --> conda-forge

than

flowchart LR

  conda-forge --> dev
  dev --> PyPI
  PyPI --> conda-forge

In the former model, we only think about conda-forge in the context of the feedstock, but in the latter, we need to think about it in our development loop as well.

better yet with all steps completed via a top-level environment.yml file

:100: