pypa / packaging.python.org

Python Packaging User Guide
http://packaging.python.org
1.39k stars 809 forks source link

Canonical guide to specifying dependencies? #685

Open pganssle opened 4 years ago

pganssle commented 4 years ago

We should probably have a canonical guide to installing dependencies on packaging.python.org, especially since PEP 508 has changed things like how to link to a git repository. I get a lot of questions like, "Why are my dependency links not working now?" and since I don't really use those I can just tell people, "Oh something about that changed, here's an inscrutable PEP that maybe would help"?

This could be done incrementally (and can be broken into smaller parts, providing there is "clearing house" page that links to the other pages), but it would be nice to clearly cover:

  1. The syntax for declaring dependencies
  2. The syntax for environment markers
  3. How to declare dependencies to a URL (i.e. not PyPI)
  4. The differences between what goes in a requirements.txt file and what can be declared in install_requires
  5. The use of constraints.txt files (which I think really reduces the need for requirements.txt.
  6. How to use extras.
  7. What is and is not allowed as a dependency in packages uploaded to PyPI.

Some of this is defined in various PEPs, but my understanding is that the canonical and current documentation should live on packaging.python.org.

There should also probably be a "hat note" (Wikipedia terminology) pointing to any such page from managing application dependencies to this page.

venthur commented 1 year ago

I also stumbled this weekend on the issue that basically nowhere in our docs we describe why you want to use a requirements.txt file. I was wondering myself, now (i.e. in mid-2022) that we changed the tutorial from recommending using setup.py to pyproject.toml, if you still need requirements.txt or if you can put the pinned dependencies in the pyproject.toml. To my surprise we don't even explain what requirements.txt files are supposed to do anywhere.

My current assumption is: nothing really changed except that you put your (mostly unpinned) dependencies to pyproject.toml instead of setup.py, so you library can be installed as a dependency of something else without causing much troubles because of issues resolving version constraints.

On top of that, for "deployable applications" (for lack of a better term), you still want to maintain a separate requirements.txt with exact version pinning.

If that's correct, I could write a document explaining that in a nice way.

pradyunsg commented 1 year ago

That's correct, yes.

Extracting the essential argument from https://caremad.io/posts/2013/07/setup-vs-requirement/ and adapting it to clearly reflect the difference between abstract vs concrete requirements in a world where we have pyproject.toml is worthwhile IMO.

danesss commented 1 year ago

After reading install_requires vs requirements files Discussion and the abstract vs. concrete argument, I agree that there is a difference in how libraries and "deployable applications" should pin versions.

However, I find that many of the stated reasons to justify separate install_requires and requirements.txt do not apply to most of my projects.

If a "deployable application" cares only about concrete dependencies it would be nice to avoid the duplication. What are the downsides of defining concrete (pinned) dependencies exclusively in a project.dependencies section in a PEP621-compliant pyproject.toml? (or in setuptools install_requires for that matter)

stefaneidelloth commented 1 year ago

I agree that the content of requirements.txt should be moved to pyproject.toml

Either as an extra section or with a way to specify both, non-pinned and pinned versions for a dependency, e.g.

dependencies = [ ['pylint', '>=2.17.4', '2.17.5'] ]

Also see

https://stackoverflow.com/questions/74508024/is-requirements-txt-still-needed-when-using-pyproject-toml/76548271#76548271

and

https://github.com/pypa/pip/issues/12100

nobody4t commented 9 months ago

I have been stuck for this issue for a little long. It can take me hours to install the deps if I put them in setup.py or pyproject.toml. And a cyclic found. But if I put all deps in requirements.txt, it can be done very soon. Why? Maybe I think requirements.txt is the best way.

webknjaz commented 9 months ago

If a "deployable application" cares only about concrete dependencies it would be nice to avoid the duplication. What are the downsides of defining concrete (pinned) dependencies exclusively in a project.dependencies section in a PEP621-compliant pyproject.toml? (or in setuptools install_requires for that matter)

What your deployable app usually cares about is not just pinned deps. Well, partially yes. But what you typically need are these things: 1) a way to declare what direct dependencies your app has 2) a way to pin the entire dependency tree with the transitive deps included (aka pip constraint files or lockfiles) 3) a way to tell the installer to actually pin all the (transitive) dependencies in the tree to specific version (pip constraint files)

You can generate such constraint files using pip-tools (pip-compile) and they are supported by pip natively.

What many people don't realize, though, is that we typically have different envs where we need those constraints and some envs may need additional deps. And what we're trying to make reproducible is not necessarily just an app but all the (virtual)envs where it runs. For example, there's a prod env, and you want it to be reproducible. But there's also a test env where the tests run — this one usually includes all the prod deps + the test deps. So each of these envs needs constraint files. For libs, the situation may even be more complex: of course, they have loose direct deps in their metadata. But then, they may be running tests in many envs — under different OS, arch, Python versions. Each of those envs will result in different trees of dependencies being pulled in so they'll all need separate constraint files. Depending on the scale of the test matrix, that can be a lot of lockfiles. So the lib projects actually work with both types of requirements — own runtime deps and pinned deps for the virtualenvs where the tests run.

To answer the question regarding the downsides of pinned deps in project.dependencies — that's not the right place since its semantic purpose is listing direct deps that may be somewhat constrained but don't normally contain the entire dependency tree. What's needed additionally is a way to record all the pins for the envs where it'll run and it's not very realistic if you have more than one of those. This adds another dimension and would need a different mechanism for dealing with those factors that make each env unique.

slominskir commented 3 months ago

Do you really need two separate sets of dependencies given we already must be relying on semantic versioning to make this work anyways? Just specify the exact version of each dependency you used in your project and when a conflict occurs in someone else's project trying to use your project with the same transitive dependency the tools can try to resolve automatically by using a potentially newer version only if the newer version doesn't change the API (given it's semver). Using >= is dangerous unless used with an upper bound as well < as software often isn't backwards compatible. This upper bound should be implicit (semver). I assume this is how Java build tools work (Maven, Gradle) as they don't require you to pin explicit versions unless a conflict occurs that cannot be automatically resolved at build time, at which time you're notified of a conflict and then must decide what to do.

seakros commented 3 months ago

Agreed that additional clarity could be useful here as specifying dependencies in pyproject.toml is definitely not a drop-in replacement for a requirements.txt file, especially in the Application context (where strict pinning is necessary).

With the general migration to pyproject.toml, its perhaps use-cases like these which are leaving folks a bit confused as to what's the new right way to do things as its not really talked about anywhere (at least I was a bit at a loss).

Extracting the essential argument from https://caremad.io/posts/2013/07/setup-vs-requirement/ and adapting it to clearly reflect the difference between abstract vs concrete requirements in a world where we have pyproject.toml is worthwhile IMO.

Precisely this^

In the cases I was dealing with, the solution was not to duplicate the requirements.txt and pypoject.toml, but just to keep the dependencies in the former where i can also add --[extra-]index-url <url> (which is otherwise impossible to include in a pyproject.toml) as well as -e . to achieve the one-liner virtual environment install through pip install -r requirements.txt.

slominskir commented 3 months ago

I think the fundamental issue here is Python guidance has historically been to define your dependencies in a flexible way such that upgrades occur automatically. The install_requires docs state:

It is not considered best practice to use install_requires to pin dependencies to specific versions, or to specify sub-dependencies (i.e. dependencies of your dependencies). This is overly-restrictive, and prevents the user from gaining the benefit of dependency upgrades

Having loose requirements is only a good idea if you like your CI build failing suddenly, even if you've made no changes (because some dependency lib released a new version). That is to say, never. Builds being repeatable is priority 1.

The "optimization" of using newer versions of libs automatically is asking for trouble. Each version bump should be tested. For security, automatic updates seems most compelling. However, this feature can be handled by automated tooling instead, which can detect security vulnerabilities in dependencies and a bot will even create a pull request for you to test before merging (GitHub does this for you automatically for example).

There is one important case when automatic upgrades makes sense, but it is not gaining the benefit of dependency upgrades. It is transitive dependency conflict resolution at build time. Transitive dependencies are a very difficult problem, and semver isn't a silver bullet. The question is what versions will work? Sometimes the answer is none. The build tooling can communicate when conflicts are occurring and what automated resolutions were employed, if any. Presumably you'll be testing your software immediately afterwards. The ability to specify how a transitive conflict should be resolved with a specific version is necessary, but flexible version specifiers for auto-patching seems to be an anti-pattern.

Pinned versions should be used for both libs and applications, though this only works for libs if the build tooling used by end-users have the flexibility to override for transitive dep conflicts. I think deps should be specified only in one place, probably pyproject.toml, with explicit pinned versions only. Tooling should handle the rest.