Improving Cabal Solver Output for Better Readability and Usability

yvan-sraka commented 1 year ago

Hello everyone,

At IOG, we have repeatedly faced issues with the Cabal solver output, as recently mentioned by @angerman. We lose valuable time trying to locate crucial information that should be more prominently displayed.

Problem

Consider the following trace:

ouroboros-network> [__0] trying: ouroboros-network-0.6.0.0 (user goal)
ouroboros-network> [__1] trying: ouroboros-network-framework-0.5.0.0 (dependency of
ouroboros-network> ouroboros-network)
ouroboros-network> [__2] trying: ouroboros-network-testing-0.2.0.1 (dependency of
ouroboros-network> ouroboros-network-framework)
ouroboros-network> [__3] next goal: io-classes (dependency of ouroboros-network)
ouroboros-network> [__3] rejecting: io-classes-1.1.0.0 (conflict: ouroboros-network-testing =>
ouroboros-network> io-classes^>=0.3)
ouroboros-network> [__3] skipping: io-classes-1.0.0.1, io-classes-1.0.0.0, io-classes-0.6.0.0,
ouroboros-network> io-classes-0.5.0.0, io-classes-0.4.0.0 (has the same characteristics that
ouroboros-network> caused the previous version to fail: excluded by constraint '^>=0.3' from
ouroboros-network> 'ouroboros-network-testing')
ouroboros-network> [__3] rejecting: io-classes-0.3.0.0 (conflict: ouroboros-network =>
ouroboros-network> io-classes^>=1.1)
ouroboros-network> [__3] skipping: io-classes-0.2.0.0 (has the same characteristics that caused
ouroboros-network> the previous version to fail: excluded by constraint '^>=1.1' from
ouroboros-network> 'ouroboros-network')
ouroboros-network> [__3] fail (backjumping, conflict set: io-classes, ouroboros-network,
ouroboros-network> ouroboros-network-testing)
ouroboros-network> After searching the rest of the dependency tree exhaustively, these were the
ouroboros-network> goals I've had most trouble fulfilling: base, ouroboros-network, io-classes,
ouroboros-network> ouroboros-network-testing, ouroboros-network-framework
ouroboros-network> Try running with --minimize-conflict-set to improve the error message.
ouroboros-network> )

We need to look closely to identify the real issue:

ouroboros-network-0.6.0.0
+-> ouroboros-network-framework
| +-> ouroboros-network-framework-0.5.0.0 [selected]
|   +-> ouroboros-network-testing-0.2.0.1 [selected]
+-> io-classes
  +-> io-classes-1.1.0.0 [fail; ouroboros-network-testing => io-classes^>=0.3]
  +-> io-classes-0.3.0.0 [fail; ouroboros-network => io-classes^>=1.1]

N.B.: ouroboros-network-testing-0.3.0.0 has not been released with ouroboros-network-0.6.0.0, causing the resolution to fail. The issue was addressed in this commit on coot/ouroboros-network-0.6.0.0.

Proposed Solution

I suggest enhancing the solver output to be more readable and easier to parse, this has been already proposed:

https://github.com/haskell/cabal/issues/8475.

One improvement to this proposal would be to mark the latest known version of a package to the solver, as described here:

https://github.com/haskell/cabal/issues/1751.

This would overall contribute to a better user experience. WDYT?

Related Issues

Implementing this proposal would also be an opportunity to address:

Further enhancements could be made by addressing:

In summary, we believe that enhancing the Cabal solver output will significantly improve the user experience and save developers time. I plan to implement most of these features in a fork next weeks and hope to got inputs from the community 🙂

ulysses4ever commented 1 year ago

God speed! We'll be sure to review and accept in timely manner!

andreabedini commented 1 year ago

@yvan-sraka :clap: thank you for reopening this discussion. I wonder how we are going to visualise the backtracking though. In your example situation:

ouroboros-network-0.6.0.0
+-> ouroboros-network-framework
| +-> ouroboros-network-framework-0.5.0.0 [selected]
|   +-> ouroboros-network-testing-0.2.0.1 [selected]
+-> io-classes
  +-> io-classes-1.1.0.0 [fail; ouroboros-network-testing => io-classes^>=0.3]
  +-> io-classes-0.3.0.0 [fail; ouroboros-network => io-classes^>=1.1]

IIRC the solver would backtrack on the ouroboros-network-testing-0.2.0.1 choice and try other versions. Only if it cannot find any working solution it would fail. But I don't see it doing that backtracking in the output you copied above :thinking:

grayjay commented 1 year ago

IIRC the solver would backtrack on the ouroboros-network-testing-0.2.0.1 choice and try other versions. Only if it cannot find any working solution it would fail. But I don't see it doing that backtracking in the output you copied above

Yes, the solver would need to consider all versions of each package before failing. Unfortunately, cabal only prints the first conflicts by default. These conflicts show why cabal couldn't use the most preferred versions, but not why it failed overall. -v3 shows all of the backtracking. I think it would be great to show the full log in cases where there are very few conflicting packages and the log is relatively short. Issues #5647 and #4251 relate to simplifying the output.

yvan-sraka commented 1 year ago

@yvan-sraka clap thank you for reopening this discussion. I wonder how we are going to visualise the backtracking though. In your example situation:
ouroboros-network-0.6.0.0
+-> ouroboros-network-framework
| +-> ouroboros-network-framework-0.5.0.0 [selected]
|   +-> ouroboros-network-testing-0.2.0.1 [selected]
+-> io-classes
  +-> io-classes-1.1.0.0 [fail; ouroboros-network-testing => io-classes^>=0.3]
  +-> io-classes-0.3.0.0 [fail; ouroboros-network => io-classes^>=1.1]
IIRC the solver would backtrack on the ouroboros-network-testing-0.2.0.1 choice and try other versions. Only if it cannot find any working solution it would fail. But I don't see it doing that backtracking in the output you copied above thinking

Indeed! My current thought is to follow the proposal made by @chshersh in #8475. This approach provides a visualization of the backtracking in an easy-to-parse-with-eye manner:

cabal: [build-plan-error]
    Couldn't find a valid build plan that satisfies constraints for all packages

Build plan details:

❌ sauron-0.0.0.0
├── ✅ base-4.16.3.0 (GHC 9.2.4)
├── ✅ colourista-0.1.0.1
└── ❌ time  <==  Problems satisfying this dependency
    ├── ❌ time-1.11.1.1 (boot version)
    │    └── sauron  ==>  time ^>= 1.7  (incompatible bounds)
    │
    ├── ❌ time-1.7.0.1
    │   └── time  ==>  base >= 4.7 && < 4.13 (conflicts with: base-4.16.3.0)
    │
    └── ❌ time (skipping 32 other versions)
         └── sauron  ==>  time ^>= 1.7  (incompatible bounds)

Suggestions to fix the problem:

  * Edit bounds for your dependencies
  * Try running with '--minimize-conflict-set' to improve the error message

However, I acknowledge, as mentioned by @Mikolaj in the original RFC, that this visualization isn't universally clear. It may need further tweaking, improvement, or documentation. I must say, I'm fond of the “skipping 32 other versions” heuristic, but maybe it lacks explicit instruction for verbose output, like a --full-backtrack-trace option, for instance. WDYT?

Yes, the solver would need to consider all versions of each package before failing. Unfortunately, cabal only prints the first conflicts by default. These conflicts show why cabal couldn't use the most preferred versions, but not why it failed overall. -v3 shows all of the backtracking. I think it would be great to show the full log in cases where there are very few conflicting packages and the log is relatively short. Issues #5647 and #4251 relate to simplifying the output.

I understand from the conversation in the linked issues that a more useful solver output is already available (with --minimize-conflict-set), but it's disabled by default due to a performance flaw. We want to at least measure and display this computational cost with #4594 (do you have in mind some sort of progress bar?). But here's a potential naive question: the cost is only incurred when the solver fails, right? It doesn't slow down successful builds, only the faulty ones, correct?

Could we consider a scenario where cabal fails quickly with a minimal message (potentially without any solver output), but encourages the user to rerun the command with a, e.g., --diagnostic flag (which AFAIU -v3 currently stands for?). This flag would then trigger the computation of the minimal conflict set, regardless of its slow performance.

To clarify, my underlying suggestion is that, for a better developer experience, we might prefer no solver output and a clear error message that encourages the user to seek a comprehensive solver output diagnostic. This approach could be preferable to having a confusing output by default, which may mislead a user who may not instinctively use the correct diagnostic flags to debug their issue. What are your thoughts on this?

grayjay commented 1 year ago

I understand from the conversation in the linked issues that a more useful solver output is already available (with --minimize-conflict-set), but it's disabled by default due to a performance flaw. We want to at least measure and display this computational cost with https://github.com/haskell/cabal/issues/4594 (do you have in mind some sort of progress bar?). But here's a potential naive question: the cost is only incurred when the solver fails, right? It doesn't slow down successful builds, only the faulty ones, correct?

Could we consider a scenario where cabal fails quickly with a minimal message (potentially without any solver output), but encourages the user to rerun the command with a, e.g., --diagnostic flag (which AFAIU -v3 currently stands for?). This flag would then trigger the computation of the minimal conflict set, regardless of its slow performance.

To clarify, my underlying suggestion is that, for a better developer experience, we might prefer no solver output and a clear error message that encourages the user to seek a comprehensive solver output diagnostic. This approach could be preferable to having a confusing output by default, which may mislead a user who may not instinctively use the correct diagnostic flags to debug their issue. What are your thoughts on this?

I think that adding a flag to show high quality output from the solver (and minimal output otherwise) makes sense if the output is expensive to generate. However, I think that the --minimize-conflict-set feature is only a small part of making the output easy to understand. I think we would at least need to make these changes:

Show all backtracking. This information is currently only shown with -v3, but it is currently far too long and difficult to read.
Only show the packages that are involved in the conflict that led to the failure. This feature is currently available with --minimize-conflict-set. When the flag is set, the -v3 log shows the whole process of shrinking the conflict set, which means that it shows multiple runs of the solver. Only the last run of the solver in the -v3 log should be shown by default, though.
Summarize information in the log more effectively, for example, avoid listing all versions of a package that conflicted with a constraint (#4251).
Find a way to limit the size of the output from the solver. A conflict can have an arbitrarily large number of variables, so there will still be conflicts that can't be summarized in a concise way. In that case, the solver could fall back to truncating the log after the first backtracking, as it currently does.
Optional: Display the information as a tree diagram.

Mikolaj commented 11 months ago

Related: the "(constraint from minimum version of Cabal used by Setup.hs requires >=3.8)" part of the error message in https://github.com/haskell/cabal/issues/9294 could also use some improvement (it comes from an extra restriction on Custom setup scripts, see the ticket).

yvan-sraka commented 9 months ago

Hi everyone,

Since quite a while has passed since I posted this RFC, and my approach has somewhat diverged from the original plan. Here's an update on what has been done, what I'm currently working on (which may need reviews), and what remains to be done:

The Cabal external command system https://github.com/haskell/cabal/pull/9063 has been merged. A follow-up by @mpickering https://github.com/haskell/cabal/pull/9412 should be soon too ;
Some refactoring of the solver logic has been merged https://github.com/haskell/cabal/pull/9282 (with the accompanying suggestion to add a deprecation warning https://github.com/haskell/cabal/pull/9206 before removing more code). Others are still under reviews https://github.com/haskell/cabal/pull/9159 ;
I'm currently facing bugs in a side quest https://github.com/haskell/cabal/pull/9160. Also, there seems to be no strong consensus on other low-hanging fruits, such as adding ANSI colors or fancy emojis to solver logs.

So, what's the next step? I'm focusing on completing https://github.com/haskell/cabal/pull/9465 and experimenting with external tools to which I could pipe a mechanized (JSON) alternative solver log output. The rationale behind this is that changing the display of solver logs without altering the (sensitive) solver logic itself is hard. While verbose logs of the solver help in understanding its backtracking algorithm, they are not suitable as default output for standard users... The best approach, IMHO, is to dynamically display it, mimicking what @marlorn's nix-output-monitor does on top of nix-build --log-format internal-json -v. I believe this is the quickest route to achieving the "tree" view suggested in the original RFC. Also, experimenting with an external command allows greater freedom to iterate based on user feedback and to determine the most convenient output before ultimately integrating it upstream, with a fallback to the current output that users may have relied on (even though considering it a stable API might be unwise). WDYT?

haskell / cabal