Closed mpizenberg closed 1 year ago
45% is nothing to sneeze at! Definitely worth more investigation. I wonder how much this is effected by the indirect packages in zuse. Like:
"phf_shared-phf_generator-Minor(7)->=0.7.24, <0.8.0": {
"0.7.0": {
"phf_shared-Minor(7)": [
("0.7.24", Some("0.7.25")),
],
},
},
makes for some long chains to deal with, and this change should prioritize getting them delt with.
Any kind of heuristic change may behave better or worse depending on a specific set of dependencies indeed. I think prioritizing dependencies with a low number of dependencies should generally improve performances. But if we want to make sure that is the case in the context of crates.io, a good way to measure the impact of such a change would be to evaluate how it impacts the following measures:
I can confirm the numbers you are seeing. I am seeing a smaller but still significant improvement when on top of the priority-queue
branch (-20.433% for zuse and -17% for hobo). I will try to run the "all crates" generator for comparison next time I have a chance.
I reran the "all crates" generator and it took about the same 21hr as it did last time. That is when I realized that I had changed OfflineDependencyProvider
not the custom one used by the generator. :facepalm: So did it again with a change to the correct DependencyProvider
. It is still running 37hr later. So that suggests some things to research when it finishes:
get_dependencies
to count the deps. It is cached, but still does a Dependencies<>.clone()
. I don't think this should matter, but it is worth a check.solana-cli-config=1.4.9
(or maybe 1.3.7
) looked worth more investigation. It looked like it took >100 sec, witch seams really odd given the boring lists of deps. And that Cargo handles that set of deps almost instantaneously.So the outcome of this experiment may be "the generator is making problems that are harder then needed" or "pubgrub handles some common case badly" instead of something about this Heuristic.
Sorry for the delay. I have not yet found a ron file that criterion can run that is slower with this heuristic. I will continue investigating, but it should not delay preceding here. What do you think the next step is?
I don't mind waiting as long as it needs before introducing another heuristic like this. You definitely spotted a problem when running all crates, so I'd prefer finding out what the problem is first.
Also, I won't have much time this summer to investigate this because of other things I'll be organizing. But I wouldn't want to block people interested in contributing during all that period. So maybe the safest thing is just deciding that we cut 0.2.1 here so that the release branch is in a suitable state for whoever would want to use it as base for experiments on the "pre-release versions" problem.
At a minimum this suggests that we need to add a comment "The best heuristic may depend on your work load." to the Guide, and that for v0.3 we should document that "OfflineDependencyProvider impls whatever heuristic works best for us, and it can change in patch versions."
I have done a lot of work optimizing the "all crates" script on top of the priority-queue
branch. It can now run in 4069s! So time to try this Heuristic again. Unfortunately, it took 11019s, or 2.7x slower. 🤷
Alright, the idea seemed good but apparently degraded performances in general. So let's just keep it in mind and move on to other useful things.
I implement a variant of the heuristics we talked about using both the number of versions and the number of dependencies to pick a package. In this implementation, what I'm doing is:
I'm seeing respectively 7%, 45% and 10% improvements on the elm, zuse, synthetic benchmarks.