python-poetry / poetry

Python packaging and dependency management made easy
https://python-poetry.org
MIT License
31.55k stars 2.26k forks source link

speed-up by early pruning based on markers #4952

Open maciejskorski opened 2 years ago

maciejskorski commented 2 years ago

Feature Request

As of now poetry spends a lot of time on overrides which doesn't match the environmental version or platform requirements. They are filtered in the very end of the process, after writing possible scenarios to the lock file. It would be great to have early pruning of such overrides, particularly as more and more packages go for conditional logic in their requriements. The issue fits the broader call for better support of markers, such as #4670

Consider this example run within a virtual environment with Python 3.8

[tool.poetry.dependencies]
python = "~3.8"
pandas = [
    {version="1.3.*",markers="python_version=='3.9.*'"},
    {version="1.2.*",markers="python_version=='3.7.*'"},
    {version="1.1.*",markers="python_version=='3.10.*'"},
]
....

Petry correctly concludes the empty set of matching packages, however it splits the graph in 6 branches (due to explicit version split and one more implicit in pandas numpy dependency) - and this escalates the more markers we have. It would help to compare the markers early and ignore some of the branches.

NOTE: this is a broader call for being able to support restricted target environments and narrow the lock to prescribed architectures, python versions and so on.

maciejskorski commented 2 years ago

Here are my further thoughts once I got familiar with the code:

Install configs, written to the .lock file are platform agnostic, e.g. cover all possible scenarios. This exponentially impacts the running time in some scenarios (combine pandas, black, and azureml in its modern releases).

Adding pre-filtering to match the target markers seems easy, but it's going to change the way log file looks like so much more work on the tests side - as they test not only the picked up instructions but also the content file (hash included)

radoering commented 2 years ago

The example is only a special case of the issue, isn't it? Actually, in the example all of the pandas constraints should be ignored because they don't match the project's python constraint. (I would almost consider it a bug that this currently is not the case.) If the python constraint would be "^3.7" instead of "~3.8", it would be correct to consider all pandas overrides when locking (no matter in which environment).

I think, in the example you can even force the desired behaviour by python instead of markers:

[tool.poetry.dependencies]
python = "~3.8"
pandas = [
    {version="1.3.*",python="3.9.*"},
    {version="1.2.*",python="3.7.*"},
    {version="1.1.*",python="3.10.*"},
]

When using this syntax, the pandas constraints are ignored...

In my opinion, creating one lock file for all platforms and python versions on an arbitrary platform with one of the supported python versions (together with multiple constraint dependencies in the pyproject.toml) is one of the most important features of poetry. Further, once the lock file has been created, installation does not require locking again.

maciejskorski commented 2 years ago

The example is only a special case of the issue, isn't it?

Yes, of course. We want to pre-filter more important markers such as OS or architecture. They often appear in modern ML stack (recent releases of pandas, azure, black...). Sorry if that was not clear, I can bring up a more devastating example:-)

I think, in the example you can even force the desired behaviour by python instead of markers:

Not really, the behaviour of python constraints is known to have bugs when markers appear. I really want to support markers. Like I said, markers appear in data-science or ML stack and supporting them is important.

In my opinion, creating one lock file for all platforms and python versions on an arbitrary platform with one of the supported python versions (together with multiple constraint dependencies in the pyproject.toml) is one of the most important features of poetry. Further, once the lock file has been created, installation does not require locking again.

That's the case of mature packages. I want to support coders which develop internal dependencies and debug intensively, within a team, switching and updating dependencies often. Then updating or recreating happens often and is really time-consuming.

Yes this is great, that the lock is system agnostic:-) However, this makes poetry slow due to exponentially many possibilities when building bigger packages. Use-cases are in between: say we develop for restricted platforms or restricted architectures, restricted python versions or other subset of markers. This is a call for constraining the target environment for valid reasons, could be done in the toml file as in the PR #4956. Of course, as an option. Let other enjoy the poetry being environment agnostic :-)

hungbie commented 2 years ago

+1 for this. We have a big project in our team, each time poetry solves the dependencies it already takes more than 10 minutes (each try). However, we know beforehand the version we want as well as the platform, so retries to have a solution that valid everywhere is redundant and a big waste of resources. Currently in our project we have to retries around 20 times, making the whole process super slow, sometimes can be upto 4 hours.

jfyu commented 2 years ago

+1 for this as well. pandas and numpy are ubiquitous to data science and this issue is slowing down our microservice project build.