pubgrub-rs / advanced_dependency_providers

experiment with the DependencyProvider trait
1 stars 0 forks source link

"lockfiles" #4

Open Eh2406 opened 3 years ago

Eh2406 commented 3 years ago

"lockfiles" is a way to encode and reuse the output from a previous resolution, and even if the dependencies have changed keep things as similar as possible. In Cargo this is done by a set of heuristics that are part of the dependency provider, and it is a mess. In dart this is somehow worked into the resolver.

This issue is to split off and continue the conversation from https://github.com/pubgrub-rs/pubgrub/issues/39

Eh2406 commented 3 years ago

To summarize the comments from https://github.com/pubgrub-rs/pubgrub/issues/39 so far:

mpizenberg: https://github.com/pubgrub-rs/pubgrub/issues/39#issuecomment-710786729

Lockfiles

I can see different levels of support for this functionality, depending on the level of trust we give to the lockfile.

aleksator: https://github.com/pubgrub-rs/pubgrub/issues/39#issuecomment-710797433

The problem comes when updating a set of dependencies (bumping dependency version in Cargo.toml) with a lockfile present. The packages should be kept as closely as possible to specified lockfile while still performing needed upgrades.

mpizenberg: https://github.com/pubgrub-rs/pubgrub/issues/39#issuecomment-710833034

Interesting! Is there a formal "distance" specification in the space of versions. A way of ranking different choices?

In case versions in the lockfile are compatible with changes in Cargo.toml there is nothing special to do. In case only one package changes to a new constraint incompatible with the locked version, it's also straightforward. In case multiple packages change there could be plenty of different approaches. Here are some ideas.

1) The efficient-driven approach: don't care, we just put locked versions first in the list of available versions. If they are picked as compatible, great, if they aren't, well that's life. By design (locked versions are tried first) it will result in few changes. 2) The comprehensive approach: pick a distance metric (like 0 if exact the same, 1 if different, or 1 for patch bump, 2 for minor bump, etc. just any metric), then use a hashmap that keeps the order of packages (we had discussed that in https://github.com/mpizenberg/pubgrub-rs/pull/13#issuecomment-704117807) and run the solver for every order possible of those packages, keep the one with the lowest distance to the lockfile. That is obviously a waste of time, but it's the most accurate way to get a "minimal changes" new lockfile. 3) The random reward-driven approach: randomly sample a few different orders, at least one per different package first, and leverage the should_cancel by providing some feedback, like current decisions in the partial solution to only continue the most rewarding runs being done and cut the others. This may not play well with backtracking though since we may keep a runner on a path that currently has a lower distance, but then will have to backtrack.

I think in most cases, (1) is the best choice. It's also a choice that doesn't require any change to the current state of pubgrub.

aleksator: https://github.com/pubgrub-rs/pubgrub/issues/39#issuecomment-710866315

Oh, it's rather binary: is it possible to use the exact version specified in the lockfile? Yes -> use it, no -> use normal dependency resolution ("unlock it").

So if my understanding is correct, the strategy 1 should be used and indeed the support of lock files delegated to DependencyProvider implementation.

Eh2406 commented 3 years ago

In Cargo we definitely do 1. and it is an important tool. But we have another layer of hacks on top because of another property of Lockfiles. If the lockfile can satisfy all dependencies then we do not ask the index about new versions. Updating the index is an expensive step and we only want to do it if we need to.

The hacks we do in Cargo involve replacing Ranges with "=" constraints if there is a version in the lockfile that matches, then have "list_available_versions" not update the index if the Range is an "=" constraint. This is a mess, and often wrong.

But I think it can be done well with the changes in https://github.com/pubgrub-rs/pubgrub/pull/50 that removes "list_available_versions". If there is a package that is in the lockfile, then decide on that before other packages. If there is a version in the lockfile that is contained in the Range then return it first; else update the index and take the best.