python-poetry / poetry

Python packaging and dependency management made easy
https://python-poetry.org
MIT License
31.14k stars 2.25k forks source link

Backend-dependent PyTorch versions #6671

Closed BlueskyFR closed 1 year ago

BlueskyFR commented 1 year ago

Issue

Hi!

I do deep learning and am currently trying to switch to poetry for better dependencies management šŸ˜„ I immediatly encountered a problem when trying to install PyTorch:

  1. PyTorch versions are backend-dependent, so the latest PyTorch version has releases for say 20 different CUDA versions, and constraining everyone cloning the project to the same CUDA version makes no sense, and I guess Poetry cannot currently handle this
  2. PyTorch is not on PyPi, more on that below.

In order to solve the first problem (even if an official support from Poetry would be appreciated), I went with the famous light-the-torch, that automatically installs the right PyTorch version depending on the detected backend. The problem with this is that torch is not added to pyproject.toml afterwards, so if I do a subsequent poetry install XXX, the torch package has a lot of chances of being replaced by the torch from pip.

Also, specifying the PyTorch url in the .toml is not a solution again since it is dependent on the local backend. In a PyTorch project, the common factor is the PyTorch package version, not the backend on which it runs as the latter just ensures the project can run on a variety of configurations.

This means that Poetry is not currently compatible with PyTorch. I don't think that saying that PyTorch is a special case is a good idea since this release design just exposes the need to compile a package for each particular software stack, so it is really a general problem which must be solved IMO.

I'll be happy to discuss below of how Poetry must adapt to this design!

BlueskyFR commented 1 year ago

A solution would be to have a post-install hook that runs light-the-torch, but the installed torch with it would be have to be frozen afterwards to ensure it is not replaced by a subsequent poetry install

neersighted commented 1 year ago

Duplicate #2145 -- this is in general pretty out of scope for Poetry as things currently exist, but possible with a plugin. Painful, but possible, and we can gradually introduce hooks to make this sort of thing easier.

Keep in mind that this is, in fact, highly specific to PyTorch -- there is no standard convention for distributing wheels built against different ML APIs. PyTorch does it one way, and other ML packages do it in other ways. A standard for describing and reasoning about wheel compatibility is needed for support in Poetry beyond a package-specific plugin, as the +cu111 et al. convention is just that -- adhoc (mis)use of local versions that existing tools interact with in sometimes unexpected (to those not familiar with Python packaging) ways.

In order for broad support in the ecosystem (including natively in Poetry) to happen, standardization of ML APIs/ABIs is necessary as part of the wheel spec (or a successor).

BlueskyFR commented 1 year ago

I totally agree that there is no standard for this, but the problem is still present so some help could be given on Poetry's side through tools such as hooks and package freezing I hope

neersighted commented 1 year ago

If you're willing to freeze versions there's no problem -- add the correct pytorch.org package index and you'll be locked to one API version.

Per #6409 performance leaves something to be desired as currently we emulate pip (+ the new resolver)'s behavior of checking every index exhaustively. However, you can do what you want today.

If you are talking about install-time selection of the proper variant, that is a full duplicate of the issue I linked. The best we'll be able to do until such time that we standardize markers in the ecosystem is adding some sort of hooks for custom markers -- but there's a lot of work necessary on the Plugin API before we can even think about such hooks.

BlueskyFR commented 1 year ago

Downloading 50 GB of package is not an option for me sadly, so this leaves me with no solution I guess

neersighted commented 1 year ago

It's very unclear what you want, I think. Are you asking for some way to add packages to a Poetry environment using poetry install that bypasses the normal resolution process? Until PyTorch indexes implement PEP 658 (and we gain support), we will have to download wheels for every platform as a result of how Python packaging works at a fundamental level.

4956 or a successor will help you if you want to limit the scope of compatibility (e.g. you never plan to install on Windows so you don't care about solving for Windows).

Basically, if you don't want to solve ahead of time and have a universal poetry install that works everywhere, Poetry may not be the right tool for you.

If you're willing to accept solving ahead of time requiring downloading PyTorch wheels, poetry export + ltt may just work for you -- many projects (including poetry-core and certbot) make use of Poetry for management of a requirements.txt list.

Secrus commented 1 year ago

@BlueskyFR I would try addressing this issue with PyTorch team, since it's them doing non-standard things. I don't like the idea of Poetry, which is based on widely accepted standards, having to adapt to non-standard ways. The way I see it, they could have a simple wheel on PyPI that would provide CLI for setting up a proper environment.

BlueskyFR commented 1 year ago

@neersighted sorry for being unclear. What I want is the following:

  1. I run poetry install on the cloned repo, which doesn't have torch in its dependencies
  2. I then run poetry run ltt install torch which installs the latest torch, compatible with my local backend
  3. I then "freeze" PyTorch so that poetry cannot replace it with the pip version if I then run poetry add X

Is it possible?

BlueskyFR commented 1 year ago

In the same spirit, what if I want to install a custom built PyTorch version?

neersighted commented 1 year ago

You're really asking for a feature where you can inject 'fake' packages into Poetry's resolution, so that Poetry considers them satisfied and solved for.

I'd create a new feature request issue for that -- the basic idea is that you would specify something like:

[tool.poetry]
dependencies-external = ["pytorch"]

And Poetry would consider pytorch: * to be provided and act like it was locked/installed, while not in fact locking/installing it at all, and just trusting you, the end user, to install it correctly (e.g. using ltt) so your code can run.

Please note the above design is ad-hoc -- what the final design would look like, and if this would be accepted by the project at all would have to be hammered out on the FR issue you create, and/or on the PR defining the implementation.

neersighted commented 1 year ago

In the same spirit, what if I want to install a custom built PyTorch version?

You can do this today with URL dependencies and markers (but, as markers do not include any facility to discriminate based on ML API, this doesn't solve anything you couldn't do already with the pytorch indexes).

BlueskyFR commented 1 year ago

Thanks for the feedback. I don't think I have the time to write FR and follow them at the moment, as this is likely to take weeks and I am looking for a quick solution. So to wrap up, ltt is not compatible with poetry in its current state right?

neersighted commented 1 year ago

Poetry is not designed to interoperate with other tools that manipulate packages in its dependency tree, no. Even if we add the feature I described above, it will always be a best-effort/"it happens to work" sort of thing. That is to say, you're taking a lot into your own hands, and if Poetry's incomplete solution ignoring a package breaks when combined with LTT's, that's on you to solve, since it's not reasonably a problem with either Poetry or LTT.

BlueskyFR commented 1 year ago

That's right, but I am just disappointed by the fact that no way to manage dependencies in a PyTorch project šŸ˜¢

neersighted commented 1 year ago

Sorry to hear that -- Poetry works fine for users who are able to ensure a consistent ML API situation across all their install targets. For Poetry to 'just work' across APIs and not require compromises like the proposed feature above, this is a topic for the PyPA, discuss.python.org, and a PEP defining ML APIs as a new wheel tag.

BlueskyFR commented 1 year ago

Is the issue with the secondary download url being adressed? That would be the beginning of a solution image

neersighted commented 1 year ago

That's purely cosmetic -- it's a consequence of how additional sources are designed, and if you run pip in verbose mode with --extra-index-url you will see it does the same thing (not setting secondary or default duplicates pip --extra-index-url, and default duplicates --extra-index-url; secondary is still unconditionally searched and is a Poetry invention).

https://github.com/python-poetry/poetry/pull/5984#issuecomment-1237245571 is a proposal to solve this by breaking the 'purely pip-like' semantics of non-PyPI sources.

BlueskyFR commented 1 year ago

Ok Why does Poetry download the wheel file 2 times when specifying the { url = "XXX" }? It is downloaded once at the resolution and a second time to "upgrade" it by downloading the exact same file again: image

neersighted commented 1 year ago

If the wheel wasn't installed by Poetry, it may be missing a PEP 610 marker, aka direct_url.json. That is to say, Poetry will only consider it the same torch version and not reinstall it if the marker exists and matches the URL that Poetry was configured with.

This is getting fairly off topic and turning into more of a support discussion (and I think the original issue was more of question than anything actionable anyway) -- I'm migrating this to Discussions as such.