haskell / cabal

Official upstream development repository for Cabal and cabal-install
https://haskell.org/cabal
Other
1.61k stars 691 forks source link

Slow dependency resolution using solver option `--reorder-goals` #7784

Open newhoggy opened 2 years ago

newhoggy commented 2 years ago

Describe the bug Dependency resolution is very slow on Windows. For example here, it is taking 37 minutes:

image

To Reproduce Steps to reproduce the behavior:

On Windows, install msys2. Then in a bash shell:

git clone https://github.com/input-output-hk/libsodium
cd libsodium
git checkout 66f017f16633f2060db25e17c170c2afa0f2a8a1
./configure
make
make install
cd -

Then:

git clone git@github.com:input-output-hk/cardano-node.git
cd cardano-node
cabal build all --dry-run

Expected behavior A clear and concise description of what you expected to happen.

System information

Additional context N/A

Mikolaj commented 2 years ago

Do you know how that compares to cabal 3.4 and to 3.6.2 on Linux on the same machine?

gbaz commented 2 years ago

We put a lot of work into optimizing the solver in the last release, and we know it works reasonably on other platforms on this exact case. I'd be surprised if it was the solver itself taking more time on windows than other platforms -- there's nothing machine dependent in it. On the other hand, the linked git repos could be expensive to manage and checkout compared to other systems, or in fact traversing the entire directory tree to scan files for updates could be expensive, as windows has very different filesystem ops with different performance characteristics. A CI trace such as above gives me very little indication of where time is actually spent in such a setting, and unfortunately I don't have a windows box handy, so can't dig in further directly.

newhoggy commented 2 years ago

Do you know how that compares to cabal 3.4 and to 3.6.2 on Linux on the same machine?

I don't have a "same machine" comparison.

For comparison in Github Actions, the same build step with cabal-3.6.2 takes:

https://github.com/input-output-hk/cardano-node/runs/4057378590?check_suite_focus=true

For comparison in Github Actions, the same build step with cabal-3.4 takes:

https://github.com/input-output-hk/cardano-node/actions/runs/1399416440

newhoggy commented 2 years ago

I'd be surprised if it was the solver itself taking more time on windows than other platforms

It's possible it's not the solver, but it would at least be something that runs adjacent to the solver in that cabal sits on the print out Resolving dependencies... for a very long time.

unfortunately I don't have a windows box handy

Would a Virtual Box VM with everything setup help with this?

jneira commented 2 years ago

It's possible it's not the solver, but it would at least be something that runs adjacent to the solver in that cabal sits on the print out Resolving dependencies... for a very long time.

maybe setting a verbose output with -v3 could help to see what cabal is doing in that step with more detail

I have not get such ill behaviour building any package in my windows machines, will try to reproduce

newhoggy commented 2 years ago

I've attached timed v3 logs which should help identify where the slowness is coming from.

log.out.timed.txt

fgaz commented 2 years ago

Interesting... there's zero backtracking, and while 6232 steps is a lot, it shouldn't take a literal hour, so there must be some problem in the step themselves: some take over a second!

10:01:52 Environment: ...
...
10:05:22 [__0] trying: Win32-network-0.1.0.0 (user goal)
...
10:07:30 [797] trying: cardano-node-chairman:cardano-node:exe.goblins:-development
10:07:31 [798] trying: cardano-node-chairman:cardano-node:exe.cardano-prelude-test~>cardano-prelude-test-0.1.0.0 (dependency of cardano-node-chairman:cardano-node:exe.cardano-ledger-shelley-test)
10:07:32 [799] trying: cardano-node-chairman:cardano-node:exe.cardano-prelude-test:-development
10:07:33 [800] trying: cardano-node-chairman:cardano-node:exe.cardano-ledger-pretty~>cardano-ledger-pretty-0.1.0.0 (dependency of cardano-node-chairman:cardano-node:exe.cardano-ledger-shelley-test)
10:07:34 [801] trying: cardano-node-chairman:cardano-node:exe.cardano-crypto-tests~>cardano-crypto-tests-2.0.0 (dependency of cardano-node-chairman:cardano-node:exe.cardano-api *test)
...
10:45:04 [6231] trying: plutus-core:alex:exe.containers~>containers-0.6.2.1/installed-0.6.2.1
10:45:04 [6232] done
10:45:04 Elaborating the install plan...
...
10:56:29 In order, the following would be built:
coot commented 2 years ago

I am hit by this problem on linux with cabal-3.6.2.0 (ghc-8.10.7) on the same repo that @newhoggy reported.

coot commented 2 years ago

oh, I noticed that my cabal.project.local file was a source of the delay, specifically:

reorder-goals: True

(if included it takes a terrible hit on the dependency resolution time)

fgaz commented 2 years ago

Nice find @coot! It looks like the same option is set in cardano-node's windows workflow:

https://github.com/input-output-hk/cardano-node/blob/e456aad30c87ca7f215b3c4139b3ae049bf6ca9e/.github/workflows/cabal.project.local.Windows#L8

Compare to linux:

https://github.com/input-output-hk/cardano-node/blob/e456aad30c87ca7f215b3c4139b3ae049bf6ca9e/.github/workflows/cabal.project.local.Linux#L8

coot commented 2 years ago

Thanks @fgaz, I asked @newhoggy exactly that on our internal slack :).

coot commented 2 years ago

I am not sure if there is a regression here. Although, a few months ago this used to work cross platform, since that time cardano-node included another large component which for sure had an impact on the complexity of the problem.

Mikolaj commented 2 years ago

What's the time without reorder-goals? Is there any hint in the docs (inside cabal help and/or in the external manuals) that a slowdown should be expected?

jneira commented 2 years ago

time to change issue title and os label then?

coot commented 2 years ago

What's the time without reorder-goals?

Something in the order of 10-20s.

Is there any hint in the docs (inside cabal help and/or in the external manuals) that a slowdown should be expected?

Yes it does, and a slowdown is expected, ref, though a 2 order of magnitude slowdown is quite large (15s -> 1200s).

For me this sounds like a low priority, nice to have optimisation issue - but I don't know implantation details, maybe it's actually a reasonable tread-of between nice error message and performance. Regardless of this, the docs should be worded more carefully.

fgaz commented 2 years ago

The commentary ticket linked in the docs explains the behavior noted in https://github.com/haskell/cabal/issues/7784#issuecomment-956264130

https://github.com/haskell/cabal/issues/1780#issuecomment-41035934:

Let's clarify the situation a bit: --reorder-goals applies a heuristic that will typically mean that a valid solution (if one exist) appears significantly "earlier" in the search space. However, in doing so, it'll make traversing the search space more costly, because additional analysis is being performed at each step in determining which goals are more or less costly. [...] The extra analysis performed by --reorder-goals can potentially be sped by memoization.

https://github.com/haskell/cabal/issues/1780#issuecomment-330754654:

--reorder-goals still makes each step significantly slower

Mikolaj commented 2 years ago

Wow. I've added a warning to the ticket that is linked to from the user guide.

grayjay commented 2 years ago

--reorder-goals tries to make decisions (explore goals) with fewer choices earlier. In the worst case, it needs to consider every goal at every level, which is N^2 in the number of steps.

I think that --reorder-goals became a lot less useful after #5918 added --fine-grained-conflicts, which is an independent optimization that seems to help in a lot of cases where --reorder-goals was needed before. It doesn't have a significant per-step performance penalty, so it is enabled by default. Since the options are different optimizations, I assume that there are still some cases where --reorder-goals makes a solution possible, and we shouldn't remove it. We should update the documentation, especially removing the part about --reorder-goals working well for big install plans.

gbaz commented 2 years ago

Skimming through the related code, I think reorderGoals is a pretty expensive heuristic for what it gives, and I wonder if there's not a faster and more effective one lurking around.

Also the expensive function could be made much faster either by using a cruder approximation or just giving it a lazier search: https://github.com/haskell/cabal/blob/67c440a4085b73df5422420a213967450edccf1b/cabal-install-solver/src/Distribution/Solver/Modular/Tree.hs#L179

(edit: this is to say that at at minimum the reorderGoals heuristic can be left as is and made universally faster, but also that other better and even faster heuristics seem very possible!)