python-poetry / poetry

Python packaging and dependency management made easy
https://python-poetry.org
MIT License
31.62k stars 2.27k forks source link

Poetry cannot correctly select dependencies #5896

Closed mihirsamdarshi closed 3 days ago

mihirsamdarshi commented 2 years ago
[tool.poetry]
name = "poetry-fail"
version = "0.0.1"
description = "Repro of project that doesn't work"
authors = ["mihirsamdarshi"]

[tool.poetry.dependencies]
python = "^3.10"
bjoern = "^3.2.1"
boto3 = "^1.24.14"
boto3-stubs = { version = "^1.24.14", extras = ["ec2", "s3"] }
caper = "^2.2.0"
Flask = "^2.1.2"
Flask-Cors = "^3.0.10"
Flask-RESTful = "^0.3.9"
Flask-SQLAlchemy = "^2.5.1"
google-cloud-storage = "^2.4.0"
pandas = "^1.4.2"
PyMySQL = "^1.0.2"
requests = "^2.28.0"
smart-open = "^6.0.0"
SQLAlchemy = "^1.4.37"
Werkzeug = "^2.1.2"
wsgi-request-logger = "^0.4.6"

[tool.poetry.dev-dependencies]
pytest = "^7.1.2"
black = "^22.3.0"
pylint = "^2.14.3"
hypothesis = "^6.47.0"
jupyter = "^1.0.0"
flake8 = "^4.0.1"

Issue

With this particular pyproject.toml Poetry is unable to select a version of awscli, regardless of if I run poetry update, poetry install, or poetry lock

When running with -vvv it hangs with the following repeated message:

   1: derived: not awscli (==1.21.1)
   1: fact: awscli (1.21.0) depends on botocore (1.22.0)
   1: fact: awscli (1.21.0) depends on docutils (>=0.10,<0.16)
   1: fact: awscli (1.21.0) depends on s3transfer (>=0.5.0,<0.6.0)
   1: fact: awscli (1.21.0) depends on PyYAML (>=3.10,<5.5)
   1: fact: awscli (1.21.0) depends on colorama (>=0.2.5,<0.4.4)
   1: fact: awscli (1.21.0) depends on rsa (>=3.1.2,<4.8)
   1: derived: not awscli (==1.21.0)
   1: fact: awscli (1.20.65) depends on botocore (1.21.65)
   1: fact: awscli (1.20.65) depends on docutils (>=0.10,<0.16)
   1: fact: awscli (1.20.65) depends on s3transfer (>=0.5.0,<0.6.0)
   1: fact: awscli (1.20.65) depends on PyYAML (>=3.10,<5.5)
   1: fact: awscli (1.20.65) depends on colorama (>=0.2.5,<0.4.4)
   1: fact: awscli (1.20.65) depends on rsa (>=3.1.2,<4.8)
   1: derived: not awscli (==1.20.65)
   1: fact: awscli (1.20.64) depends on botocore (1.21.64)
   1: fact: awscli (1.20.64) depends on docutils (>=0.10,<0.16)
   1: fact: awscli (1.20.64) depends on s3transfer (>=0.5.0,<0.6.0)
   1: fact: awscli (1.20.64) depends on PyYAML (>=3.10,<5.5)
   1: fact: awscli (1.20.64) depends on colorama (>=0.2.5,<0.4.4)
   1: fact: awscli (1.20.64) depends on rsa (>=3.1.2,<4.8)
   1: derived: not awscli (==1.20.64)
   1: fact: awscli (1.20.63) depends on botocore (1.21.63)
   1: fact: awscli (1.20.63) depends on docutils (>=0.10,<0.16)
   1: fact: awscli (1.20.63) depends on s3transfer (>=0.5.0,<0.6.0)
   1: fact: awscli (1.20.63) depends on PyYAML (>=3.10,<5.5)
   1: fact: awscli (1.20.63) depends on colorama (>=0.2.5,<0.4.4)
   1: fact: awscli (1.20.63) depends on rsa (>=3.1.2,<4.8)
   1: derived: not awscli (==1.20.63)
   1: fact: awscli (1.20.62) depends on botocore (1.21.62)
   1: fact: awscli (1.20.62) depends on docutils (>=0.10,<0.16)
   1: fact: awscli (1.20.62) depends on s3transfer (>=0.5.0,<0.6.0)
   1: fact: awscli (1.20.62) depends on PyYAML (>=3.10,<5.5)
   1: fact: awscli (1.20.62) depends on colorama (>=0.2.5,<0.4.4)
   1: fact: awscli (1.20.62) depends on rsa (>=3.1.2,<4.8)
   1: derived: not awscli (==1.20.62)
   1: fact: awscli (1.20.61) depends on botocore (1.21.61)
   1: fact: awscli (1.20.61) depends on docutils (>=0.10,<0.16)
   1: fact: awscli (1.20.61) depends on s3transfer (>=0.5.0,<0.6.0)
   1: fact: awscli (1.20.61) depends on PyYAML (>=3.10,<5.5)
   1: fact: awscli (1.20.61) depends on colorama (>=0.2.5,<0.4.4)
   1: fact: awscli (1.20.61) depends on rsa (>=3.1.2,<4.8)
   1: derived: not awscli (==1.20.61)
   1: fact: awscli (1.20.60) depends on botocore (1.21.60)
   1: fact: awscli (1.20.60) depends on docutils (>=0.10,<0.16)
   1: fact: awscli (1.20.60) depends on s3transfer (>=0.5.0,<0.6.0)
   1: fact: awscli (1.20.60) depends on PyYAML (>=3.10,<5.5)
   1: fact: awscli (1.20.60) depends on colorama (>=0.2.5,<0.4.4)
   1: fact: awscli (1.20.60) depends on rsa (>=3.1.2,<4.8)
   1: derived: not awscli (==1.20.60)
   1: fact: awscli (1.20.59) depends on botocore (1.21.59)
   1: fact: awscli (1.20.59) depends on docutils (>=0.10,<0.16)
   1: fact: awscli (1.20.59) depends on s3transfer (>=0.5.0,<0.6.0)
   1: fact: awscli (1.20.59) depends on PyYAML (>=3.10,<5.5)
   1: fact: awscli (1.20.59) depends on colorama (>=0.2.5,<0.4.4)
   1: fact: awscli (1.20.59) depends on rsa (>=3.1.2,<4.8)
   1: derived: not awscli (==1.20.59)
   1: fact: awscli (1.20.58) depends on botocore (1.21.58)
   1: fact: awscli (1.20.58) depends on docutils (>=0.10,<0.16)
   1: fact: awscli (1.20.58) depends on s3transfer (>=0.5.0,<0.6.0)
   1: fact: awscli (1.20.58) depends on PyYAML (>=3.10,<5.5)
   1: fact: awscli (1.20.58) depends on colorama (>=0.2.5,<0.4.4)
   1: fact: awscli (1.20.58) depends on rsa (>=3.1.2,<4.8)
   1: derived: not awscli (==1.20.58)
dimbleby commented 2 years ago

That's not a "repeated" message, poetry is (slowly) working its way through the many versions of awscli that are not compatible with the selections it has previously made at that point in the search.

There are some performance improvements in the latest beta that might help a bit, but your best bet is almost certainly to specify that you want some recent version of awscli. That will allow poetry to fail much faster - and backtrack the search and find a solution.

mihirsamdarshi commented 2 years ago

Thanks, I mean that it was repeatedly trying to solve. Adding in the awscli dev and figuring out where the conflict lay did it.

mm-matthias commented 2 years ago

We are having this problem since more than half a year. Last time clearing the poetry caches helped, this time we had to do more. In our case we had PyYAML = "*" and awscli ="*" as dependencies. Poetry got stuck just as the OP outlined. When we removed PyYAML the (seemingly) infinite loop would end and the poetry.lock file was created in seconds. The other fix that worked was to use awscli = ">=1.25.26" which cut that infinite loop short. This is basically the same solution that @dimbleby suggested.

mm-matthias commented 2 years ago

I checked with the latest poetry version 1.2.0b2 and it did not help. Setting a minimum version for awscli was the only real solution for us.

zyv commented 1 year ago

@mihirsamdarshi @mkniewallner could you please explain why did you close this issue?

We are still facing the same problem with the latest Poetry release and it's reproducible in a stable way. The changed in the other issues didn't solve the problem and there is no other issue linked that is open.

If there is no other issue tracking this performance issue, could you please reopen this one?

zaytsev@parallels:~$ poetry --version
Poetry (version 1.4.2)
camerondavison commented 1 year ago

I am not sure if this is the problem or not for this specific issue but i am on poetry 1.5.1 and am able to get poetry to spin for a long time in a brand new project by just running

poetry init -n
poetry add 'urllib3@*' 'boto3@^1'

my understanding is that if any package selects urllib3 that is in conflict with botocore (ie >= 2.0) then your stuck downloading every boto3 library between the first time that you added the library (because poetry put ^1.x.x on whatever day you originally added it) and now (boto3 publishes new versions very often)

would it be possible to do some kind of shallow discovery of libraries that publish a lot of versions in order to select a new transitive dependency version. in this case it still resolves after a long time of downloading 100s of boto versions but only because it rules out all versions of boto and restarts at the top with a lower urllib version.

finswimmer commented 3 days ago

The dependency resolution is known to be quite slow of awscli and/or botot3 are involved as there are an enormous number of versions.

The options to improve this are limited by Poetry. Best way, as others explained above, is to limit the version range that Poetry should try.

zyv commented 3 days ago

@finswimmer, would you please be so kind as to comment on what exactly limits Poetry's options to improve this, so that people facing this problem can understand why it's closed as not planned and they have to introduce an artificial lower limit to limit the packages that are scanned?

My vague understanding was that there is no metadata service, so in some cases Poetry has to download the packages from the beginning of the day to extract the metadata first to make sure they don't match, and this causes the slowness.

Is this understanding correct? If so, would it make sense to bring this up with the PyPI / PyPA people?

zyv commented 3 days ago

P.S. Just saw that in #8823 @dimbleby explained:

it's just good or bad luck whether your solver happens first to explore a path that fixes boto3 first (when it is very easy to find a satisfactory urllib3) or a path that fixes urllib3 first (when it is very hard to find a satisfactory boto3)

the most useful thing you can do right now, for future-you and the rest of the ecosystem, is to go and offer merge requests to django-distill or fiftyone or whoever, putting a (recent) lower bound on their boto3 dependency.

Then no installer is exposed to having to backtrack through the thousands of versions of boto3 that amazon release

I think that's as close to an explanation as I can get, but I'm still confused as to why backtracking through thousands of versions is a problem. It seems to me that if you have all the relevant constraints at hand, it shouldn't be difficult to solve. So is it true that constraints can only be obtained by downloading the packages in question, and that's the root cause of the issue?

radoering commented 3 days ago

So is it true that constraints can only be obtained by downloading the packages in question, and that's the root cause of the issue?

It depends:

zyv commented 3 days ago

So is it true that constraints can only be obtained by downloading the packages in question, and that's the root cause of the issue?

It depends:

  • If you use PyPI as index, then dependencies of a package can be obtained without downloading the package if wheels are provided (thanks to the PEP 658 backfill). If it is an sdist only release, then (in most cases) the sdist has to be downloaded. However, even without downloading wheels or sdists, it still takes some time to backtrack thousands of versions and fetch the metadata of each version.

But OP is using PyPI and boto3 provides wheels, so the problem is really in the efficiency of Python code in this case?

radoering commented 3 days ago

Not sure if it is about Python or just network requests or the algorithm itself (independent from the programming language).

Secrus commented 3 days ago

TL;DR there are many moving parts between network, algorithms, and Python, but boto having daily releases and a ton of versions to check doesn't help.

zyv commented 3 days ago

Not sure if it is about Python or just network requests or the algorithm itself (independent from the programming language).

Well, I think it makes a huge difference.

I remember that about two years ago my colleagues complaining that Poetry was saturating our 500 Mbit link, so it definitively felt like it's downloading half of the world. If nowadays it's "just" network requests and/or resolution code, then in theory something could be done about it at the Poetry side.

But I see, I guess I can only get these detailed answers by looking at the code myself :( Thank you for the pointers!