pubgrub-rs / pubgrub

PubGrub version solving algorithm implemented in Rust
https://pubgrub-rs.github.io/pubgrub/pubgrub/
Mozilla Public License 2.0
337 stars 30 forks source link

Experimental new Interval API with continuous space for versions #99

Closed mpizenberg closed 2 years ago

mpizenberg commented 3 years ago

Tests are not passing yet. It is quite a big change so some bugs were inserted (not on purpose ^^)

Eh2406 commented 3 years ago

It is a big change, I could use some pointers on where to start looking this over. What are the core new ideas? What advantages do they bring?

mpizenberg commented 3 years ago

The core idea of this API design attempt is to relax the Version trait enough to enable more features but still avoid making Range a user-implemented trait. Instead I went for a middle-ground of adding an Interval trait that has to be provided by the user. (PS for the time being I kept the Range name but would like to rename it to Ranges plural).

So the traits to implement by the user are the following.

pub trait Version: Clone + Ord + Debug + Display {
    /// Returns the minimum version.
    fn minimum() -> Bound<Self>;
    /// Returns the maximum version.
    fn maximum() -> Bound<Self>;
}

/// An interval is a bounded domain containing all values
/// between its starting and ending bounds.
pub trait Interval<V>: RangeBounds<V> + Debug + Clone + Eq + PartialEq {
    /// Create an interval from its starting and ending bounds.
    /// It's the caller responsability to order them correctly.
    fn new(start_bound: Bound<V>, end_bound: Bound<V>) -> Self;
}

The Interval type extends std::ops::RangeBounds so the user has to implement that trait as well.

pub trait RangeBounds<T> {
    pub fn start_bound(&self) -> Bound<&T>;
    pub fn end_bound(&self) -> Bound<&T>;

    pub fn contains<U>(&self, item: &U) -> bool { ... }

This means giving the start and the end of the interval as bounds (potentially infinite). And a contains method is provided with a default implementation (check if within bounds) but can be overridden. For example, for semantic versions, one could check if a bound contains a pre-release marker to alter the contains implementation.

But the cool change is that the Range type is still handled by ourselves. It is defined as follows.

pub struct Range<I, V> {
    segments: SmallVec<I>,
    phantom: PhantomData<V>,
}

So basically the same as before with the exception that it now has two type variables, one for Interval, one for Version. I could not figure out if I could just have the I and not the V because it was giving me compiler issues in the impl block. (ooohh I now realize that V could be an associate type of Interval instead of a type variable ... I changed all types everywhere ...)

impl<I: Interval<V>, V: Version> Range<I, V> {
   ...
}

Since Version is much less strict than before, it enables things like continuous types since there is no more bump (which pre-releases kind-of implies) and infinite double bounds, not only on the right side. And in our Range implementation, the contains method is implemented as before, except that instead of returning true when a version is inside one of the intervals, it delegates a call to Interval::contains and the user might decide that it does not contains that version (e.g. pre-releases if bounds do not contains pre-releases).

The downside is that the user cannot override the behavior for versions outside of a given interval, but I don't see that being a desirable property. The user also do not see the full Range when answering for contains, only one of its Interval so it would not be possible to introduce faithfully things like "no pre-release except if that's the only version satisfying a given constraint" (in PEP 440), because it would not know if potentially another segment in the range might contain a non-pre-release version. I think that would be an extremely rare case anyway.

But the huge advantage is that we keep the control of Range and its quite hard to implement functions like intersection and complement. Proof is I introduced a bug when rewriting it XD and now need to chase it ^^.

mpizenberg commented 3 years ago

To have a look at what's new, you can just look at the range_trait (misnamed) and version_trait modules. The rest of the changes are just type variables manipulations.

Eh2406 commented 3 years ago

I think this will be cleaner with a associated type for V. I will do a closer review when tests are passing. Then try to see how to support the Semver Crate with this approach.

Eh2406 commented 3 years ago

Thanks for the explanation! I had a few minutes to look this over. It is definitely an intriguing approach!

I am not seeing how to construct a Range that represents something like Semvers >=2.0.0-alfa, witch is "(non-pre versions >= 2.0.0) union (pre versions >= 2.0.0-alfa and < 2.0.1)". I can see how to make an Interval that stores whether its contains matches a pre, but Range only constructs I using I::new so I don't see how to get a Range that has some Intervals that do and some that don't.

mpizenberg commented 3 years ago

I am not seeing how to construct a Range that represents something like Semvers >=2.0.0-alfa

Well if your semver type is something like

impl SemVer {
    fn new(major: u32, minor: u32, patch: u32, pre: Option<String>) -> Self { ... }
}

Then you'd just call

SemInterval::new(Bound::Included(SemVer::new(2, 0, 0, Some("alpha"))), Bound::Unbounded)

And within the SemInterval implementation, you can override the contains method to check for a pre-release in bound. Something like

impl RangeBounds<SemVer> for SemInterval {
    fn start_bound(&self) -> Bound<&SemVer> { ... }
    fn end_bound(&self) -> Bound<&SemVer> { ... }
    // this one we override the default implementation provided by `RangeBounds`
    fn contains(&self, item: &SemVer) -> bool {
        // 2.0.0-beta
        if item.is_prerelease() {
            // 2.0.0-alpha                                    2.0.0-alpha, 2.0.0
            self.start.is_prerelease() && item.within_bounds(self.start, self.start.without_prerelease())
                           // 4.5.3-beta,      4.5.3-alpha,  4.5.3-beta,         4.5.3           4.5.3-beta
                || self.end.is_prerelease() && item.lower_than(self.end) && item.without_prerelease().higher_than(self.end)
        } else {
            item.within_bounds(self.start, self.end)
        }
    }

}

It's more or less pseudo code so take that with a grain of salt as I've not thought through it all, just writing stuff as it comes to mind right now.

mpizenberg commented 3 years ago

PS, I've edited the above example code for contains() at least a couple times so don't trust what the email you received says XD

mpizenberg commented 3 years ago

The above code example might be horribly wrong, the important notion is that "if we can explain in english with simple rules if a version (pre-release, or not) is included in an interval by just looking at the interval bounds, then we can do it in code too"

mpizenberg commented 3 years ago

but Range only constructs I using I::new so I don't see how to get a Range that has some Intervals that do and some that don't.

Oh maybe I misinterpret the question. So the thing is that Range knows each interval bound. So the Range can check bounds of its stored intervals (which are non-intersecting) and once it reaches one that should contain the given version, instead of returning true it returns that_interval.contains(that_version).

See for example here: https://github.com/pubgrub-rs/pubgrub/blob/948d2c4ac3c862bca47f66b16e31d0ed376e9645/src/range_trait.rs#L293

I should definitely add some ascii segments drawing for each branch of that Range.contains() method.

Eh2406 commented 3 years ago

That all makes sense except for one small part. When I call SemInterval::new(Bound::Included(SemVer::new(2, 0, 0, Some("alpha"))), Bound::Unbounded) I get a SemInterval but I need a Range to put in my DependencyConstraints.

mpizenberg commented 3 years ago

That all makes sense except for one small part. When I call SemInterval::new(Bound::Included(SemVer::new(2, 0, 0, Some("alpha"))), Bound::Unbounded) I get a SemInterval but I need a Range to put in my DependencyConstraints.

Well we expose the same API as before for Range which will create those intervals internally. So for example

    /// Set of all versions comprised between two given versions.
    /// The lower bound is included and the higher bound excluded.
    /// `v1 <= v < v2`.
    pub fn between(v1: impl Into<V>, v2: impl Into<V>) -> Self {
        let start_bound = Bound::Included(v1.into());
        let end_bound = Bound::Excluded(v2.into());
        Self {
            segments: SmallVec::one(I::new(start_bound, end_bound)),
            phantom: PhantomData,
        }
    }

That's why I needed to add the new() function the the Interval trait. I didn't yet found an alternative but maybe there is a smarter way to do that.

That's what you'll find here: https://github.com/pubgrub-rs/pubgrub/blob/948d2c4ac3c862bca47f66b16e31d0ed376e9645/src/range_trait.rs#L111

Eh2406 commented 3 years ago

The above code example might be horribly wrong, the important notion is that "if we can explain in english with simple rules if a version (pre-release, or not) is included in an interval by just looking at the interval bounds, then we can do it in code too"

It took a few readings before this sank in. But now that I get it, it sounds worth a try!

mpizenberg commented 3 years ago

It took a few readings before this sank in. But now that I get it, it sounds worth a try!

If the VersionReq type in the semver crate is what I believe it is, contains (Interval::contains not Range::contains) could literally be implemented by re-using their matches method. So something in the like of

self.into_version_req().matches(version)

Of course we could just rewrite the content of that function, that would be more efficient and simpler than doing the into_version_req conversion.

Eh2406 commented 3 years ago

So one thing we are going to need to be careful of is that range.contains(v) == false does not mean that range.complement().contains(v) == true. I am not sure all the places this may bight us. One that I have thought of is:

  1. We have a dependency foo="<5.0.0"
  2. We discover that there are no matching versions; unbenounced to us there is a 2.0.0-alfa but it is not compatible with this Req
  3. We backtrack
  4. We find a new dependency foo="<2.0.0-beta". If we are not careful we will decided that because 2.0.0-beta < 5.0.0 and there are no versions in "<5.0.0" then there are no version in "<2.0.0-beta"
mpizenberg commented 3 years ago

Hum, very true ... Breaking the strict meaning of contains in an asymmetric manner might cause issues with backtracking and saved NoVersion incompatibilities.

mpizenberg commented 3 years ago

So one thing we are going to need to be careful of is that range.contains(v) == false does not mean that range.complement().contains(v) == true. I am not sure all the places this may bight us. One that I have thought of is:

1. We have a dependency `foo="<5.0.0"`

2. We discover that there are no matching versions; unbenounced to us there is a `2.0.0-alfa` but it is not compatible with this Req

3. We backtrack

4. We find a new dependency `foo="<2.0.0-beta"`. If we are not careful we will decided that because `2.0.0-beta` < `5.0.0` and there are no versions in `"<5.0.0"` then there are no version in `"<2.0.0-beta"`

Do we know how dart pub handles such a situation?

mpizenberg commented 3 years ago

Do we know how dart pub handles such a situation?

So I cloned pub and added the following test:

  test('backtracking to pre-release after NoVersions', () async {
    await servePackages((builder) {
      builder.serve('a', '1.1.0', deps: {'b': '^1.0.0'});
      builder.serve('b', '1.1.0-alpha');
      builder.serve('a', '1.0.0', deps: {'b': '^1.1.0-alpha'});
    });

    await d.appDir({
      'a': '^1.0.0',
    }).create();
    await expectResolves(result: {
      'a': '1.1.0',
      'b': '1.1.0-alpha',
    });
  });

And that test is passing. Meaning they accept b 1.1.0-alpha even though a 1.1.0 asked for b ^1.0.0 so without pre-release. I suppose they do that to never generate a NoVersion incompat if only viable versions are pre-releases? Or maybe it's for another reason. Might be worth asking them?

mpizenberg commented 3 years ago

Hum, however, if I go one dependency deeper to force a backtracking to happen, it seems to make pub solver fail!

  test('backtracking to pre-release after NoVersions', () async {
    await servePackages((builder) {
      builder.serve('a', '1.1.0', deps: {'b': '^1.0.0'});
      builder.serve('b', '1.0.0', deps: {'c': '^1.0.0'});
      builder.serve('c', '0.9.0');
      builder.serve('b', '1.1.0-alpha');
      builder.serve('a', '1.0.0', deps: {'b': '^1.1.0-alpha'});
    });

    await d.appDir({
      'a': '^1.0.0',
    }).create();
    await expectResolves();
    // await expectResolves(result: {
    //   'a': '1.1.0',
    //   'b': '1.1.0-alpha',
    // });
  });

This test fails even though I'm just asking it expectResolves() which should pass if there is any solution. Which we know there are. Because a 1.0.0, b 1.1.0-alpha is a valid solution, or even a 1.1.0, b 1.1.0-alpha if it behaved like the other example above with one less dependency deep.

mpizenberg commented 3 years ago

I've created a PR on pub repo with the failing test asking for their thoughts: https://github.com/dart-lang/pub/pull/3038

Eh2406 commented 3 years ago

Do we know how dart pub handles such a situation?

I was going to do more research but I thought the answer was "by having more Range like rules around when pre releases match", looks like you did the research and found the real answer is "with bugs".

mpizenberg commented 2 years ago

Don't bother looking at those new commits. I'm trying stuff to figure out what is wrong with the intervals. For now, I've added logging, which lead me to discover that one assignment intersection was wrong at some point. Leading me to the fact that there is a problem with intersection since if I set

Then we have r1.intersection(r2) that is correct, returning > 1, < 2 but r2.intersection(r1) that is incorrect, returning >= 1, < 2.

Typically, this should have been spotted by the intersection_is_symmetric property test, but it's not, even if I increase very much the number of proptest cases. So I've been trying to improve the strategy for generation of ranges with bounds that may take all possible shapes, but without success for now.

mpizenberg commented 2 years ago

I've temporarily deactivated the serde feature and all tests (except with serde) are passing :)

I didn't found a way to trigger the symmetric intersection tests though, so I just fixed the bug in intersection :(. I didn't know it could be possible to be unhappy about fixing a long standing bug but that's what I feel after failing to trigger it with property tests ahah.

Before evaluating how much of a performance drain is the usage of bounds, I thus need to fix all the code related to the serde feature, to be able to run the benchmarks. I'll probably also have to re-encode the ron files. I'm not sure how I'll do that but will see.

mpizenberg commented 2 years ago

The code in this branch will also need a decent amount of cleaning. And while evaluating perf, verifying that the log:: usage does not impact performances.

Eh2406 commented 2 years ago

If there is a bug in intersection, let's make the fix a separate PR so we can get it out there ASAP.

mpizenberg commented 2 years ago

Nope no worry, there was a bug in my reimplementation of intersection for the Interval API in this PR. The main one that is released is fine.

On Sun, Aug 8, 2021, 16:05 Jacob Finkelman @.***> wrote:

If there is a bug in intersection, let's make the fix a separate PR so we can get it out there ASAP.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pubgrub-rs/pubgrub/pull/99#issuecomment-894802948, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWFOCLGP3OO522HKD5FHSDT32FJJANCNFSM47EEHCYA .

mpizenberg commented 2 years ago

This has been superseded by #112