Open baszalmstra opened 2 years ago
For some context #36 especially this comment and the PR #55.
Our surface level interface does not have a way to express this. It can only say "A depends on B" meaning "do not build the solution where a version of A is selected and a version of B is not also selected". However this is just a limitation of our interface. The internal Incompatibility data structure can happily express more complicated constraints like "do not build the solution where a version of A is selected and a version of B is selected". (Along with arbitrarily more exotic things.)
So far we've done a pretty good job of having a user-friendly public interface and documenting it at the module level, package level, and book level. I would be open to designing a more flexible interface as long as it continued to be easy to use. Alternatively we could design a lower level interface as long as we have a good pattern for how straightforward use cases can be done easily. As you've noticed no one has had time to work on this project given its existing level of polish, so maybe we should lower our standards to let you get unstuck.
PS, there is also an optimisation on the conflict resolution backtracking algo introduced by Jacob that is not exactly what we should do but is faster and equivalent considering our interface to build constraints. If we allow more expressive constraints. We need check that this optimisation is still correct. And we also need to adapt the generators for the property tests.
On Thu, Oct 13, 2022, 17:56 Jacob Finkelman @.***> wrote:
For some context #36 https://github.com/pubgrub-rs/pubgrub/issues/36 especially this comment https://github.com/pubgrub-rs/pubgrub/issues/36#issuecomment-709400796 and the PR #55 https://github.com/pubgrub-rs/pubgrub/pull/55.
Our surface level interface does not have a way to express this. It can only say "A depends on B" meaning "do not bill the solution where a version of A is selected and a version of B is not also selected". However this is just a limitation of our interface. The internal Incompatibility https://github.com/pubgrub-rs/pubgrub/blob/release/src/internal/incompatibility.rs#L33 data structure can happily express more complicated constraints like "do not bill the solution where a version of A is selected and a version of B is selected". (Along with arbitrarily more exotic things.)
So far we've done a pretty good job of having a user-friendly public interface and documenting it at the module level, package level, and book level. I would be open to designing a more flexible interface as long as it continued to be easy to use. Alternatively we could design a lower level interface as long as we have a good pattern for how straightforward use cases can be done easily. As you've noticed no one has had time to work on this project given its existing level of polish, so maybe we should lower our standards to let you get unstuck.
— Reply to this email directly, view it on GitHub https://github.com/pubgrub-rs/pubgrub/issues/120#issuecomment-1277836263, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWFOCIBVI4FKQMV2G22FBLWDAWJZANCNFSM6AAAAAAREEAE4Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>
This comment and the few following it are what I had in mind in my previous comment: https://rust-lang.zulipchat.com/#narrow/stream/260232-t-cargo.2FPubGrub/topic/partial_solution/near/240019814
Basically, we make the hypothesis that a package term first appearance is necessarily coming from a dependency from a previously selected package in the partial solution. If we change the expressiveness of what users can put in dependencies, we just need to make sure that it doesn't contradict the shortcut we take when identifying the previous satisfier and the id of the step to backtrack.
If there is no issue, it's fine. And if there is an issue, we probably can just revert to the previous stricter approach of backtracking.
@Eh2406 I would very much appreciate getting unstuck! But I would also love to help keep up the level of polish for this project if at all possible!
Would it be an idea to modify the interface of the DependencyProvider
to also return how a dependency should be used? Whether it should be included in the solution or not? I'm not really sure what other kinds of dependencies one might want to express.
Yes we will have to change the return type of get_dependencies
from Result<Dependencies<P, VS>, Box<dyn Error>>
.
We could add in Enum for whether each dependency is required
or constrained
(wordsmithing required), something like Result<Dependencies<P, Requirement<VS>>, Box<dyn Error>>
. This lets us add bespoke syntax for each kind of dependency we have found a use for, but this is beginning to get a little complicated.
The other kind of dependency we may want to add in the future is for modeling "week dependencies". (Cargo currently handles this is a post resolution pass.) Where fundamentally we want to say "if A and B are selected then C must also be selected".
While that's all I can think of at the moment, I wouldn't be surprised to discover there are more things people need. At some point we will probably need to add some kind of raw direct access for particularly weird cases. Possibly as a direct modification like #121.
Extra credit: There are two minor performance issues with the current API. Both examples of things that I wouldn't even know or care about if this were written in another language, but bug me because it's obvious in Rust. (Previous discussion about the same problems with an older version of the API at #18.) The only thing we do with the Dependencies
returned from get_dependencies
is look at each item one at a time and convert it to an Incompatibility
. (The code looks a little more complicated because of interning.)
Dependencies
to iterate over it. A caching dependency provider ends up having to clone this hash map just for us to iterate over it and throw it out.The direct solution to the first problem is to take a reference Result<&Dependencies<P, VS>, Box<dyn Error>>
, but that does not work for non-caching dependency providers. We could solve it by making everyone's life more complicated with something like Result<Cow<Dependencies<P, VS>>, Box<dyn Error>>
, but that is a lot of complexity budget given the size of the problem.
Solving the second by returning Result<impl Into<Incompatibilitie<P, VS>>, Box<dyn Error>>
, also blows the complexity budget even before we realize that impl trait is not supported in traits. So it actually involves in associated type, or two. And last time I tried it requires GATs. (lending iterators.)
I don't mean to say that these extra credit problems need to be solved (now or at all). Just saying if we end up with a high level "just return a hash map of dependencies" and a low level "give us the incompatibilities to add" it would be nice if the low level at least did not have these issues.
Thanks for the info, I will give it a go.
Ill start simple by implementing your suggestion: Result<Dependencies<P, Requirement<VS>>, Box<dyn Error>>
where Requirement
is an enum of Required
, and Constrainted
.
Im trying to figure out how to start:
Incompatibility
struct, should a "constraint" be added as a new "Kind" but act the same as a dependency? Or should the package_terms
be (in terms of how it is documented) "not A = 1, B = 2` instead of "A = 1, not B = 2"?Honestly, I think the most important thing to start with would be improving the property test generator strategies to be able to add more general constraints (like any incompatibilities). I'd feel much less safe not knowing we have property tests to check such a change in the expressiveness of user constraints.
Could you provide some hints as to what you had in mind? Im still a bit in over my head with this project but Im happy to give it a go!
@Eh2406 has written in the guide a bit about how our property based strategies are done to generate sets of packages and dependencies https://pubgrub-rs-guide.netlify.app/testing/property.html#end-to-end-property-testing I haven't looked at that for a long time. What would be awesome would be trying to find strategies to generate valid sets with more expressive dependencies, up to the expressiveness we want to allow in the API. Maximum expressiveness would be to allow any kind of valid incompatibility. (Incompatibilities are explained here https://pubgrub-rs-guide.netlify.app/internals/incompatibilities.html). Currently the only kind of incompatibilities allowed coming from users are of the shape:
{ a : range_a, b : not(range_b) }
meaning versions within range_a of package a depends on versions within range_b of package b. Actually even stricter, not range_a but exactly one version v1 since we ask dependencies of a specific package version.
If we want for example to allow users to specify constraints directly, such as (1) "this package a at version v1 is incompatible with all versions of package b", or (2) "this package a cannot be installed simultaneously with both b and c", these would be represented by the following incompatibilities (notice the absence of not
compared to previously):
(1) { a : v1, b : all }
(2) { a : v1, b : all, c : all }
And if we allow users to provide these kind of incompatibilities via pubgrub API, we want to have property based strategies able to check situations with similar expressiveness. So this may not be trivial, but it's an important step to stay confident that an implementation is correct. We have had the current property tests finding issues with pubgrub when it was v0.1 so it's important that we try to stay bug-free.
Sorry I did not get time to respond to you yesterday.
What exactly constitutes adding a package to the bill?
"bill" was a typo for "build". You kind of have to expect typos when communicating with me, I understand that this makes things difficult and appreciate everyone's consideration. Thank you for asking for clarification.
You pointed to the
Incompatibility
struct, should a "constraint" be added as a new "Kind" but act the same as a dependency?
"Kind" has no influence on the algorithm, it is only used for building an error tree after the fact. I think it is reasonable for kind to develop a new variant, but I would lean toward FromDependencyOf
having a field for storing the kind of Requirement
. It will take experimenting with the code to figure out which ones cleaner.
Or should the
package_terms
be (in terms of how it is documented) "not A = 1, B = 2` instead of "A = 1, not B = 2"?
package_terms
is the critical part in terms of the algorithm. The documentation you referred to is correct, but skips over the relevant subtleties. Specifically, the difference between a version set and a Term
. A version set is a thing like < 3.0.0
, you can take its "complement" and get like >= 3.0.0
. For any concrete version v
vs.contains(v) == !vs.complement().contains(v)
. But this ends up being insufficient for pubgrubs algorithm, because it is ambiguous about what happens if no version is selected. (Exactly relevance to our problem.) A Term
add this bit of information. Term::Positive
does not kick in unless there is a version selected, but Term::Negative
is triggered by a lack of selected version.
A traditional dependency is expressed as { a : Term::Positive(set_a), b : Term::Negative(set_b) }
where as your new constraint wants to be { a : Term::Positive(set_a), b : Term::Positive(set_b.complement()) }
.
I will post this comment and work on the proptest side in the next comment.
As to testing...
You should definitely add a couple normal tests here, that check for the basic semantics of your new API. If a package is only mentioned in constraints, then it is not selected. However if the package is required by a dependency and conflict with the constraint then the build errors. This will be very helpful for making sure we've added the correct number of complement
calls in the code I just described. But any such examples are going to be superficial, not really touching the core of the algorithm. To make interesting bigger examples we will need to move on to the next kind of test.
We use proptest here to generate large interesting examples. And the only thing you need to add is one bit of information about each dependency, is it required or constrained. raw_dependency
can get a new any::<bool>()
for it, and then make the corresponding kind of dep here. Ok, so we have a random input how do we know it's right? There are a couple of proptests, but the big important one is that we compare the result to an SAT solver as defined here. Specifically converting deps into SAT var here, if the dep is a constraint then it could also be satisfied by there being no other version selected (not just by a matching version being selected or by the dependeror not being selected).
Thanks for the detailed explanation, I implemented the above and got it working for the simple tests (see this commit: https://github.com/baszalmstra/pubgrub/commit/46def288d27e8a35cb435b15727182ac64075f65). 🥳
I understand what to do with the SAT test but Im struggling with how to express "and no other versions are selected". Looking at the varisat crate also didn't help me. Would you be able to point me in the right direction?
I modified the Dependencies
enum to contain:
/// Container for all available package versions.
Known(DependencyConstraints<P, Requirement<VS>>),
a Requirement
is:
pub struct Requirement<VS: VersionSet> {
pub range: VS,
pub kind: RequirementKind,
}
#[derive(Clone, Copy, Debug, Eq, PartialEq)]
pub enum RequirementKind {
/// A dependency that is resolved as part of the solution.
Required,
/// Constrains the version of package but does not require inclusion of the package.
Constrained,
}
I used the term "requirement" instead of "dependency" (wordsmithing required as you said) but I didn't apply it throughout the code yet.
Any feedback at this point is already very welcome.
I understand what to do with the SAT test but Im struggling with how to express "and no other versions are selected". Looking at the varisat crate also didn't help me. Would you be able to point me in the right direction?
I've been thinking about this since you asked. I keep writing up simple solutions, only to realize they don't work. The best I have is https://github.com/pubgrub-rs/pubgrub/commit/12355f3ee04ad41ada66b92de9384a8ff609efb5 and that was easier to write than it was to explain. Basically I added a var represents no version of the package being selected.
Looking at your Requirement
+ RequirementKind
, why not all in one:
pub enum Requirement<VS: VersionSet> {
/// A dependency that is resolved as part of the solution.
Required<VS>,
/// Constrains the version of package but does not require inclusion of the package.
Constrained<VS>,
}
Im (still) working on using PubGrub to solve Conda packages. I've made a lot of progress but currently run into the problem of "constraints".
Conda packages can have two types of dependencies: regular dependencies and constraints. Regular dependencies are packages that need to be included in the solution but constraints are dependencies that don't have to be included, but if they are (via the dependency of another package), have to adhere to the constraints. I think this is similar to peer dependencies in NodeJS.
I initially modeled these as regular dependencies which I could strip from the solution after solving. However, sometimes these constraints are used to ensure a certain other package is not selected. For instance, python packages that are incompatible with
pypi
often have the constraint:pypy <0a0
. There is no version compatible with this since it denotes the lowest possible version. Using this as a dependency results in an errorDependencyOnTheEmptySet
which makes sense: that's exactly what the user intends.Is this something that could be supported with pubgrub? Is this something I could potentially add?