feat: merge dependencies for better error messages

Eh2406 commented 7 months ago

This is the "simplest thing that could possibly work", when adding a new dependency incompatibility we check to see if there is an existing incompatibility with the same parent package and dependent package and dependent VS, if so we construct a new incompatibility that represents the merging of the two incompatibilities. We remove the old incompatibility from all lookup tables and at the new one. This effectively brings us on par with Dart for #19. When it applies this can massively reduce the number of incompatibilities the algorithm scans through. So it may help a little bit with #135, depending on whether there real problem contain this exact case. The union of versions done when merging incompatibilities here is definitely a candidate for #163.

The big advantage of this is the improvement to error messages. In fact this is split up into several commits the first one as a test case with the old error message and the second one shows the new error message. Our error messages tend to have "holes" constructed by discovering the individual versions don't work. When each version doesn't work for a different reason then we are in a complicated mess and the "holes" are the least of the users problems. In this case, the only reason we discovered the issues one at a time is that we loaded the data one version at a time, which is not useful information for the user to see. I consider this improvement to error messages as justifying a lot of "half-baked" details about this approach.

Unfortunately this is pure overhead when the optimization does not apply. Most problematically this does not apply if we never load more than one version of a package, that is to say when we easily construct a solution. All told this adds a 5% to 20% cost on our benchmarks. After several days of experimenting this slowdown is caused not by any of the inefficient/half-baked things that I would like to improve. :-( Instead it is caused by constructing, populating, and dropping the additional hash table.

Can you think of a more space efficient way to find the incompatibility to merge with? Do you think the performance cost is worth the error message improvement? What is the performance cost on your preferred workload?

mpizenberg commented 7 months ago

The merge_incompatibilities function is a bit hard to read (or maybe it's getting too late ^^). If you could re-explain the approach it would help.

In my point of view, improving error messages and some worst case scenarios, at the cost of degrading the easy solutions is worth it.

Eh2406 commented 7 months ago

You are not wrong. I got distracted by the performance problems and forgot to clean up the code. Sorry.

Eh2406 commented 7 months ago

Rebased and added a commit splitting things up and added some comments. Hopefully this is clearer.

Eh2406 commented 7 months ago

I have a variation that does not have a dedicated map for dependencies, but instead scans the full list of incompatibilities for matches. In the happy path this is much faster, as there are less allocations. Unsurprisingly, larger examples grind to a halt spending all their time checking incompatibilities that are not dependencies.

I'm still thinking about ways to have our cake and eat it too. Something that only allocates if we are in a large case where it would be more efficient, and uses scanning when inputs are small. But I have not thought of result that is efficient in both cases and simple to implement.

mpizenberg commented 7 months ago

I haven't taken the time to re-read the code yet. However I've watched packaging con 2023. This talk https://youtu.be/z0AB4g2KDX8?si=tpz1DOEYdLXVmPJw about explainability of errors in spack could be relevant. One idea is that overhead can only be run in a second stage, if an error was found. This is akin to the "simplify" that was recently merged, but I guess something like this could also be done for what is proposed here. Something like follows:

solve without overhead
if that's a success, fine nothing more to do
if that's an error, run a simplification step on all recorded incompatibilities and rerun the solver with simplified recorded incompats already in.

With the knowledge that (3) brings I guess that second run should be fairly fast, and with simplified incompatibilities maybe it takes better shortcuts and give a better erroring path overall?

Eh2406 commented 7 months ago

I tested some of the larger examples I generated from cargo use cases. They do not hit this pattern. The tool I have for generating these examples is very out of date, I did not look to see if the tool could be updated to meet this pattern.

This also gave me the opportunity to measure the absolute overhead of this approach, which is quite small. The relative overhead is high because Elm if the only input to have this pattern in my benchmark suite and it has very little backtracking.

I'd like to hear from @zanieb or @konstin, about whether there real world use case it's this code path. If it does, we can always improve performance later.

konstin commented 6 months ago

There is a nice speedup:

Benchmark 1: ./main
  Time (mean ± σ):      8.028 s ±  0.439 s    [User: 7.956 s, System: 0.076 s]
  Range (min … max):    7.550 s …  8.956 s    10 runs

  Warning: Ignoring non-zero exit code.

Benchmark 2: ./branch
  Time (mean ± σ):      5.673 s ±  0.288 s    [User: 5.355 s, System: 0.321 s]
  Range (min … max):    5.411 s …  6.238 s    10 runs

  Warning: Ignoring non-zero exit code.

Summary
  ./branch ran
    1.42 ± 0.11 times faster than ./main

konstin commented 6 months ago

The error message becomes 5 lines instead of 2000

zanieb commented 6 months ago

Are the lines unreadably long? That's my primary concern, although perhaps we can handle splitting lines into readable pieces separately.

I'm making good progress on packaging scenarios for testing error messages at https://github.com/zanieb/packse

Eh2406 commented 6 months ago

I believe, as of this PR, the lines are still long. However, they get much shorter if simplify is run on them. That can be done in error reporting with the code as it currently is. Alternatively simplify can be added to the algorithm to apply it in at least this case. Witch I plan to play with in a follow up PR.

@mpizenberg can / do you want to look at this before it gets merged?

mpizenberg commented 6 months ago

I haven't heard about long lines before. Is this something discussed in the office hours?

I can have another look at this Friday if you are ok waiting two more days. Otherwise I trust you.

mpizenberg commented 6 months ago

I believe, as of this PR, the lines are still long. However, they get much shorter if simplify is run on them.

I'm curious about an example for this if possible.

Eh2406 commented 6 months ago

I haven't heard about long lines before. Is this something discussed in the office hours?

I think the test added in this PR is a great example. In the before case there were O(#vertions) lines which were O(#vertions) long. After there are 2 lines but the lines are still O(#vertions). The test case has #vertions == 6, but the real world case has hundreds or thousands of versions.

I believe, as of this PR, the lines are still long. However, they get much shorter if simplify is run on them.

I'm curious about an example for this if possible.

The line

Because there is no available version for bar and foo 1 | 2 | 3 | 4 | 5 depends on bar, foo 1 | 2 | 3 | 4 | 5 is forbidden.

Become something like

Because there is no available version for bar and foo * depends on bar, foo * is forbidden.

The line

And because there is no version of foo in <1 | >1, <2 | >2, <3 | >3, <4 | >4, <5 | >5 and root 1 depends on foo, root 1 is forbidden.

Become something like

And because there is no version of foo in (the empty set) and root 1 depends on foo, root 1 is forbidden.

I would like to defer conversation about future improvements to future PR's/issues, so that we can focus on getting this improvement merged.

I can have another look at this Friday if you are ok waiting two more days. Otherwise I trust you.

Happy to wait! If I don't get an update, I will merge on Monday. If you spot something after that open an issue and will get it fixed.

Eh2406 commented 6 months ago

I will rerun and annotate my usual benchmarks at some point today.

Eh2406 commented 6 months ago

large_cases/elm_str_SemanticVersion.ron
                        time:   [218.63 ms 218.81 ms 219.01 ms]
                        change: [+20.302% +20.450% +20.593%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild
large_cases/large_case_u16_NumberVersion.ron
                        time:   [9.6057 ms 9.6465 ms 9.6910 ms]
                        change: [+7.7815% +8.2632% +8.7441%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe
large_cases/zuse_str_SemanticVersion.ron
                        time:   [226.41 ms 226.75 ms 227.10 ms]
                        change: [+15.258% +15.512% +15.774%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

Also looks hobo_str_SemanticVersion is a ~2% regression, although it is hard to tell as noize is larger.

The optimization can hit in elm. But there is very little backtracking. So it doesn't get used much. All the overhead is building the map. It is not expensive (44 ms) but it still 20% overhead.

The randomly generated large_case also does not use the optimization much. The map is cheaper to construct, as hash of u16 is cheaper then hash of str.

zuse never utilizes the optimization, due to the way package names were generated.

Overall I wish the performance impact was smaller, but we can always work on it in follow-up PR's.

pubgrub-rs / pubgrub

feat: merge dependencies for better error messages #163