Heuristic counting dependencies

mpizenberg commented 3 years ago

I implement a variant of the heuristics we talked about using both the number of versions and the number of dependencies to pick a package. In this implementation, what I'm doing is:

if there is no valid version, pick that package
if there is at least one valid version, pick the package with the lowest score, where score is computed as : number of valid versions + number of dependencies of the version we would pick.

I'm seeing respectively 7%, 45% and 10% improvements on the elm, zuse, synthetic benchmarks.

Eh2406 commented 3 years ago

45% is nothing to sneeze at! Definitely worth more investigation. I wonder how much this is effected by the indirect packages in zuse. Like:

    "phf_shared-phf_generator-Minor(7)->=0.7.24, <0.8.0": {
        "0.7.0": {
            "phf_shared-Minor(7)": [
                ("0.7.24", Some("0.7.25")),
            ],
        },
    },

makes for some long chains to deal with, and this change should prioritize getting them delt with.

mpizenberg commented 3 years ago

Any kind of heuristic change may behave better or worse depending on a specific set of dependencies indeed. I think prioritizing dependencies with a low number of dependencies should generally improve performances. But if we want to make sure that is the case in the context of crates.io, a good way to measure the impact of such a change would be to evaluate how it impacts the following measures:

mean resolution time, or equivalently total resolution time of all packages
median, 1/10th and 1/100th worst percentile resolution time
the worst (package, version) resolution time

Eh2406 commented 3 years ago

I can confirm the numbers you are seeing. I am seeing a smaller but still significant improvement when on top of the priority-queue branch (-20.433% for zuse and -17% for hobo). I will try to run the "all crates" generator for comparison next time I have a chance.

Eh2406 commented 3 years ago

I reran the "all crates" generator and it took about the same 21hr as it did last time. That is when I realized that I had changed OfflineDependencyProvider not the custom one used by the generator. :facepalm: So did it again with a change to the correct DependencyProvider. It is still running 37hr later. So that suggests some things to research when it finishes:

I used get_dependencies to count the deps. It is cached, but still does a Dependencies<>.clone(). I don't think this should matter, but it is worth a check.
Find an example where this is slower. That is not hard to do, the hard part if finding one that also runs in a usable amount of time as a Ron file.
As things went by solana-cli-config=1.4.9 (or maybe 1.3.7) looked worth more investigation. It looked like it took >100 sec, witch seams really odd given the boring lists of deps. And that Cargo handles that set of deps almost instantaneously.

So the outcome of this experiment may be "the generator is making problems that are harder then needed" or "pubgrub handles some common case badly" instead of something about this Heuristic.

Eh2406 commented 3 years ago

Sorry for the delay. I have not yet found a ron file that criterion can run that is slower with this heuristic. I will continue investigating, but it should not delay preceding here. What do you think the next step is?

mpizenberg commented 3 years ago

I don't mind waiting as long as it needs before introducing another heuristic like this. You definitely spotted a problem when running all crates, so I'd prefer finding out what the problem is first.

Also, I won't have much time this summer to investigate this because of other things I'll be organizing. But I wouldn't want to block people interested in contributing during all that period. So maybe the safest thing is just deciding that we cut 0.2.1 here so that the release branch is in a suitable state for whoever would want to use it as base for experiments on the "pre-release versions" problem.

Eh2406 commented 3 years ago

At a minimum this suggests that we need to add a comment "The best heuristic may depend on your work load." to the Guide, and that for v0.3 we should document that "OfflineDependencyProvider impls whatever heuristic works best for us, and it can change in patch versions."

Eh2406 commented 3 years ago

I have done a lot of work optimizing the "all crates" script on top of the priority-queue branch. It can now run in 4069s! So time to try this Heuristic again. Unfortunately, it took 11019s, or 2.7x slower. 🤷

mpizenberg commented 1 year ago

Alright, the idea seemed good but apparently degraded performances in general. So let's just keep it in mind and move on to other useful things.

pubgrub-rs / pubgrub

Heuristic counting dependencies #95