pdm-project / pdm

A modern Python package and dependency manager supporting the latest PEP standards
https://pdm-project.org
MIT License
7.98k stars 410 forks source link

Resolving dependencies taking FOREVER #1379

Closed nickchomey closed 2 years ago

nickchomey commented 2 years ago

I am trying to set up the farm-haystack application with PDM, but when I try to install from its pyproject.toml file, it takes hours to work through all the dependencies - it seems to iterate through every version of every dependency, presumably checking for conflicts. In particular, it is most noticeably slow when trying to install with the [all-gpu] optional dependency egg.

I presume that the pyproject.toml file could have more specific dependencies listed (e.g.>1.3.2 instead of many that are just blank), but is there something within PDM that I could use to speed this up - even if it means not checking dependencies? If I just pip install within a standard venv environment, it obviously doesn't do any of this and is quite quick, but I'm therefore missing out on PDM's features.

SirMartin commented 2 years ago

I don't really understand your issue. I just created a project and added it haystack (version 0.42), and as you can see in the screenshot took 7 seconds, for the adding and locking process with PDM. Maybe you have another conflict with other libraries, when you got long times locking, try to execute with the verbosity (-v flag) and try to check what it is doing, and maybe if you have multiple libraries trying to find compatible versions, you can fix it. Sometimes library A needs library C in version lower than 0.4 and library B needs the library C in version higher than 0.8

If you find something like that, you can stay forever waiting for PDM to find a version that matches.

image

nickchomey commented 2 years ago

@SirMartin Would you like to try again with the package that I referred and linked to in the OP - farm-haystack? And, specifically, with the optional dependencies [all-gpu] or [all] (which I neglected to mention in the OP).

But, yes, what you've described is probably what is happening - as I mentioned as well, there are many dependences that don't have any versions listed and it seems to iterate through each version of each dependency. Which brings us back to my question - is there anything (e.g. a flag) that can be done with PDM to speed this up?

frostming commented 2 years ago

I tried farm-haystack[all-gpu] on Windows and didn't notice any obvious resolution loop, and it managed to finish after 11 minutes.

To find resolution loops, just add -v flag and watch the log to find if any packages appear periodically after Pinning <package_name>

SirMartin commented 2 years ago

I tested now with farm-haystack[all-gpu] and found two things. For me, it took 8 minutes 53 seconds without any issue.

I just found that this line took around 3 minutes to resolve:

pdm.termui: Adding requirement flatbuffers(from onnxruntime-gpu 1.12.1)

On the other side, it takes 460 rounds to make the lock, so the obvious problem is the huge amount of dependencies and sub-dependencies, but I have no clue how it can be optimized, not sure if maybe specifying versions for some libraries, make it shorter, but I don´t think so, because as far as I know, it takes the latest one, and without conflicts, should not have any difference.

frostming commented 2 years ago

pdm.termui: Adding requirement flatbuffers(from onnxruntime-gpu 1.12.1)

This is not what it was doing when it was blocking, but the next package is. Maybe torch since PDM is downloading it.

SirMartin commented 2 years ago

Yes, I don´t know what does mean that stays there, but the next package is numpy, and numpy is not the problem.

nickchomey commented 2 years ago

@frostming I tried farm-haystack[gpu-all]

It is [all-gpu] not [gpu-all] so you probably weren't actually installing all of the dependencies...

@SirMartin I tested now with farm-haystack[all-gpu] and found two things. For me, it took 8 minutes 53 seconds without any issue.

Are you sure it installed all of the optional dependencies? I had a lot of trouble (and I'm not sure I ever succeeded) in trying to install (nested) groups of optional dependencies. This is what Haystack suggests in their documentation

git clone https://github.com/deepset-ai/haystack.git
cd haystack
pip install --upgrade pip
pip install -e '.[all]' ## or 'all-gpu' for the GPU-enabled dependencies

I tried pdm install --group all-gpu, and various other iterations and none seemed to work. I think I ended up modifying pyproject.toml to make all the desired optional deps part of the mandatory ones.

Is there a particular syntax or trick that I need to follow to make it work?

@SirMartin On the other side, it takes 460 rounds to make the lock, so the obvious problem is the huge amount of dependencies and sub-dependencies, but I have no clue how it can be optimized, not sure if maybe specifying versions for some libraries, make it shorter, but I don´t think so, because as far as I know, it takes the latest one, and without conflicts, should not have any difference.

Yes, as I said, it just iterates through each version of each dependency, per dependency. You can see it with --verbose. Though, it only really stalls on particular deps rather than all of them. 460 seems FAR too low based on what I saw with all the optional dependencies included - it was surely more like 50000...

I already did and described above what you suggested here - specifying versions etc... - but it was still intolerably slow and a ton of work to find minimum versions to specify that didn't conflict with others. Besides, it seems to me that the whole point of PDM is to automate all of this. If that's simply not possible due to very un-optimized dependencies in pyproject.toml, so be it. I could try to advocate for Haystack to refine the dependencies list. But, still, I hope there is (or could be?) something within PDM that might allow for this to speed up - Haystack is surely not the only package that has un-optimized dependencies...

SirMartin commented 2 years ago

I tried the right one, with all-gpu, and as far as I understand with it and making a pdm lock, the lock file should contain all the dependencies and sub-dependencies to install. I didn't try to install them.

About rounds and conflicts I had problems before, that took like 10 minutes trying all the version, but same problem happened to me normally using pip also.

When you use PIP is much faster? For me sounds totally crazy that needs so much time to locking the project, but I don't know the huge amount of dependencies and sub-dependencies of farm-haystack.

Hopefully @frostming have some idea to help you!

nickchomey commented 2 years ago

Thanks for your efforts. Would you mind sharing the command you used to install it with all-gpu? As I said, I had a lot of difficulty figuring it out.

Yes, with pip it is a very reasonable speed for me.

Hopefully there will be a solution - I really like PDM, but it just isn't usable for this application as it is.

pawamoy commented 2 years ago

Is it possible that pip is faster because it already has everything in cache? If PDM needs to try multiple versions of torch, that's indeed several gigabytes it needs to download, which can take a very long time on slow internet connections.

frostming commented 2 years ago

It is [all-gpu] not [gpu-all] so you probably weren't actually installing all of the dependencies...

That was a typo, I installed the correct group all-gpu indeed. The command is pdm add "farm-haystack[all-gpu]" and you will see it listed in the dependencies field in pyproject.toml. PDM will fetch packages required by farm-haystack[all-gpu], not all and lock the versions in pdm.lock. To me 8min is a reasonable speed for me given so many requirements.

nickchomey commented 2 years ago

I just tried again with that command and it took about 8 min for me as well. Perhaps the issue is that I was following the instructions from Haystack and cloning the repo and then doing some sort of pdm install from the pyproject.toml that is included, rather than adding the package with pdm add. I tried a lot of things, so don't quite remember.

Anyway, one more question if you dont mind:

How can I install all-gpu from the repo so as to get the most recent commit? I tried this but it didn't work. I assume I've used the wrong syntax again.

pdm add "git+https://github.com/deepset-ai/haystack.git#egg=all-gpu"

Perhaps a bit more documentation with regards to installing groups from packages/repos would be helpful?

Thanks again for your help and for the great tool!

frostming commented 2 years ago

How can I install all-gpu from the repo so as to get the most recent commit? I tried this but it didn't work. I assume I've used the wrong syntax again.

Either syntax is okay:

PDM's docs doesn't say much about that part, because they are covered in corresponding standards or docs and we only leave a reference link to it: image

UnoYakshi commented 1 year ago

Having a similar issue.

Trying to pdm add fastapi-users-db-sqlmodel to the existing project. Stuck at Resolving: new pin pycodestyle 2.10.0. PDM version: 2.5.3.

[project]
dependencies = [
    "alembic>=1.10.4",
    "asyncpg[sa]>=0.27.0",
    "fastapi[orjson,ujson]>=0.95.1",
    "pydantic[dotenv]>=1.10.7",
    "uvicorn[standart]>=0.22.0",
    "ujson>=5.7.0",
    "orjson>=3.8.11",
    "sqlalchemy[asyncio,mypy]==1.4.41",
    "sqlmodel>=0.0.8",
    "jinja2>=3.1.2",
    "fastapi-users>=11.0.0",
]
requires-python = ">=3.11"

[tool.pdm]
[tool.pdm.dev-dependencies]
dev = [
    "pytest>=7.3.1",
    "pytest-asyncio>=0.21.0",
    "httpx>=0.24.0",
    "pre-commit>=3.2.2",
    "black>=23.3.0",
    "isort>=5.12.0",
    "mypy>=1.2.0",
    "flake8>=6.0.0",
    "autoflake>=2.1.1",
    "sqlalchemy-stubs>=0.4",
    "flake8-bugbear>=23.3.23",
    "pep8-naming>=0.13.3",
    "ruff>=0.0.263",
]
UnoYakshi commented 1 year ago

Ah, looks like the latest version of fastapi-users-db-sqlmodel has strict condition for sqllachemy@1.4.35. However, PDM doesn't tell it explicitly, it's just repeating the same action over and over.

pdm.termui: Candidate rejected: sqlalchemy@1.4.35 because it introduces a new requirement sqlalchemy==1.4.35 that conflicts with other requirements:
    sqlalchemy==1.4.41 (from sqlalchemy@1.4.41)  
  SQLAlchemy>=1.3.0 (from alembic@1.10.4)  
  SQLAlchemy<=1.4.41,>=1.4.17 (from sqlmodel@0.0.8)
pdm.termui:   Adding requirement greenlet!=0.4.17; python_version >= "3"(from sqlalchemy 1.4.35)
pdm.termui:   Adding requirement sqlalchemy==1.4.35(from sqlalchemy 1.4.35)
pdm.termui: Candidate rejected: sqlalchemy@1.4.35 because it introduces a new requirement sqlalchemy==1.4.35 that conflicts with other requirements:
    sqlalchemy==1.4.41 (from sqlalchemy@1.4.41)  
  SQLAlchemy>=1.3.0 (from alembic@1.10.4)  
  SQLAlchemy<=1.4.41,>=1.4.17 (from sqlmodel@0.0.8)