API `Distribution.Types.VersionInterval` dropped `intersectVersionIntervals` etc. in 3.6

andreasabel commented 2 years ago

Background: I am looking to implement a check whether some version constraint subsumes another one, for https://github.com/hackage-trustees/hackage-cli/issues/28. We know that a subsumes b can e.g. be implemented as (a intersect b) = b.

I notice that up to 3.4, there was intersectVersionIntervals, and I traced this back to 2.4 (maybe even older). In 3.6 this function is gone, apparently due to a rewrite: https://github.com/haskell/cabal/blob/96ea35dca786dd54d64e55a00b0da7f63f7f6e99/Cabal/src/Distribution/Types/VersionInterval.hs#L3-L6

It seems that the whole module Distribution.Types.VersionInterval was moved to Distribution.Types.VersionInterval.Legacy and the former module was reimplemented from scratch, but supplying only parts of the former API. The legacy module is announced to be deleted in 3.8.

I propose to restore the deleted functionality to Distribution.Types.VersionInterval, at least the mathematically essential functions. Basically, all the algebraic operations present for the "syntactic" form VersionRange should be implemented for the "semantic" form VersionIntervals as well; both are Boolean algebras.

(For my use in hackage-cli that would mean I could now support 2.4, then upgrade to 3.4, then to 3.8---needing to skip 3.6.)

phadej commented 2 years ago

The legacy intersectVersionIntervals is wrong. It destroys the ^>= semantics. Please, implement proper Intervals, which preserve the ^>= specification: then everyone wins. The refactoring of existing intervals was done to make adding proper ^>= intervals simpler. (There should be hints in the comments).

Until then, just use the Legacy module. It's there exactly for the backward compat.

andreasabel commented 2 years ago

@phadej, I am trying to understand your comment:

The legacy intersectVersionIntervals is wrong. It destroys the ^>= semantics.

According to the docs, https://cabal.readthedocs.io/en/latest/cabal-package.html#pkg-field-build-depends , the semantics of ^>= x is >= x && < x.1 and that of ^>= x.y... is >= x.y... && < x.(y+1). Both the legacy and the new VersionIntervals module implement this semantics.

Further, both modules use the same representation of intervals:

... increasing sequence of non-overlapping, non-empty intervals. ... canonical representation for the semantics of VersionRanges.

Canonical means that different VersionIntervals have different semantics. Thus, the intervals need to be separated (i.e. non-overlapping and non-touching). Given that version intervals form a Boolean algebra the intersectVersionIntervals can only be wrong when it is buggy. I checked but found no bugs. Note that the requirements of correctness and canonicity leave no wiggle space. Ordered sequences of separated intervals are the and the only representation (there could be different, isormorphic ways of realizing ordered sequences or the individual intervals, but this does not matter here, and is not the case in the two implementation).

So I wonder what can be the difference between the old and the new implementation. Can you provide me evidence that points out the difference, an issue, a unit test, a regression test, an explanation, ...?

phadej commented 2 years ago

According to the docs, https://cabal.readthedocs.io/en/latest/cabal-package.html#pkg-field-build-depends ,

See https://cabal.readthedocs.io/en/3.6/cabal-project.html?highlight=allow-newer#cfg-field-allow-newer

The syntax also allows to prefix the dependee package with a modifier symbol to modify the scope/semantic of the relaxation transformation in a additional ways. Currently only one modifier symbol is defined, i.e. ^ (i.e. caret) which causes the relaxation to be applied only to ^>= operators and leave all other version operators untouched.

The ^>= x.y and >= x.y && <x.(y+1) are not the same. If you wish, latter is a hard bound, former is soft.

That is current state. I don't think that many would change soft bounds to hard when evidence arises that the "guess" was in fact correct. But to make the above bounds-specifications exactly the same you need to cleanup docs and remove all other details where they differ (i.e. ^ modifier in allow-newer).

(Note: there was some debate whether cabal should have soft and hard bounds; it has now, but they aren't used. And to repeat, I don't think authors would revise soft bounds to hard ones when releases occur.)

Mikolaj commented 2 years ago

If it's not clear it's really used, could we grep Hackage and verity the situation and, if only possible, RFC and simplify as much as possible?

phadej commented 2 years ago

If it's not clear it's really used,

^>= is definitely used. But I'm 100% it's used as if it had hard semantics.

Whether anyone uses allow-newer: ^foo, we cannot know. I doubt.

EDIT: that (soft vs. hard upper bound distinction) is a feature where small group of power users could use for great benefit (e.g. testing whether new release of a lib breaks everything or not, and then making hard revisions where needed). But Hackage is large group of non-power users: the feature is on the obscure side, not understood, not used "correctly".

Mikolaj commented 2 years ago

Oh, I see. So we'd need a RFC, publicized on https://discourse.haskell.org, asking if anybody really needs it for anything, in particular for allow-newer, given that removing it would simplify API and help tools that use the Cabal library and lessen cabal/tools/others maintenance burden. Who came up with the feature initially, so that we ask for a comment? @gbaz: what do you think?

phadej commented 2 years ago

Who came up with the feature initially

Soft-hard distinction? Herbert.

andreasabel commented 2 years ago

What the new module Distribution.Types.VersionInterval concerns, I cannot see where it reflects the distinction between soft and hard constraints. As I already said, it interprets the caret-operator ^.>= in exactly the same way as the "legacy" module: https://github.com/haskell/cabal/blob/ddb58fb8bf29cb065f1d85fa17b5c4f99b9d152a/Cabal/src/Distribution/Types/VersionInterval.hs#L96-L97 Also, it makes no efforts to recover ^>= when translating back to VersionRange: https://github.com/haskell/cabal/blob/ddb58fb8bf29cb065f1d85fa17b5c4f99b9d152a/Cabal/src/Distribution/Types/VersionInterval.hs#L294-L319

If one wanted to implement soft constraints, one would have to change/parametrize the translation from VersionRange to VersionIntervals, but at the semantic level, meaning VersionIntervals, there is no soft/hard, it is simply a subset of the set of possible versions.
I do not have to repeat myself that a canonical representation of the semantics leaves not choices on the functionality of intersection/union etc., all implementation are extensionally equivalent.

I notice though that the new implementation regresses asymptotic complexity from O(n+m) to O(nm). Also, the new implementations are less perspicuous and harder to verify.

My propsal for Distribution.Types.VersionInterval:

restore the old "legacy" module, except:
salvage improvements made in the representation (like data instead of tuple`)
salvage new functionality.

The discussion about soft/hard constraints is orthogonal to the OP and should be happening elsewhere.

phadej commented 2 years ago

As I already said, it interprets the caret-operator

Yes, it does. Because I hadn't time to do the further changes. I said that already.

Also, it makes no efforts to recover ^>= when translating back to VersionRange:

It cannot. The VersionInterval type should preserve the ^>= originated intervals as separate constructor.

phadej commented 2 years ago

If such constructor is added, then (IIRC) modifying intersectInterval, unionUpper, doesNotTouch would be enough for rest of normalization to work. But I never got to make the final change.

Note how in soft semantics, foo: ^>=0.1 && <0.3 makes some sense. the bound behaves differently, depending if you don't allow-newer, allow-newer: foo or allow-newer: ^foo.

distinction could work, if say stackage was always run with allow-newer: ^*, and the negative build results were propagated back as metadata revision. but to repeat myself, I don't think package maintainers would do that.

andreasabel commented 2 years ago

Since the new implementation did not add any functionality, but only prepared for functionality that might (or might not) come in the future, I think it should have stayed a PR or branch rather than being merged and released.

As I do not think that new functionality is coming soon, since it needs a proper proposal, discussion etc., I proposed to revert to the old, more efficient and perspicuous algorithms for now and save your reimplementation in some branch, so that it can be recovered when it is needed.

phadej commented 2 years ago

branch rather than being merged and released.

Fair, but then it would be very easy to veto almost any change to Cabal.

more efficient

I benchmarked, the new ones are not inefficient.

Instead of going forward, you propose to go backwards. I'm more and more disappointed in the direction Cabal development turns into.

Btw, I drop out of Cabal development because pushing forward alone burnt me out. Even not participating actively keeps beating me, as any small steps I managed to do are proposed to be reverted. Thanks a lot.

Mikolaj commented 2 years ago

@phadej: I'm sorry you take it that way.

Cabal is in a tough time, for various reasons, some cyclical, some unique, such as Covid and crossing the critical mass of popularity and of complexity, etc. A compounding factor of this crisis, but also its result, is the loss of Cabal's most passionate maintainers and very limited input from many of its old contributors. IMHO, we now struggle to minimise losses until we can rebound, so we are not at liberty to choose a "direction" in any fundamental way.

Regarding features, especially at times of crisis, IMHO every feature should be questioned and proposed for a removal, whether it's fully implemented or not. The RFC process is to ensure that we either start understanding the purpose and commit to maintaining/finishing the feature, or we see that at this time there is not enough interest/capacity and we should cut our loses. It can always be brought back later and it will happen naturally if there is a strong need for it.

Please correct me, if I got anything wrong. E.g., if the issue we are discussion is not a feature, but a proposal for a simplification, refactoring, a change that will enable deleting other things or delegating a lot of functionality to other tools. That would be a completely different situation and I'd understand your worry about it being "easy to veto almost any change".

gbaz commented 2 years ago

I think that this module is a relatively minor component of cabal stuff, all told. Invoking big picture questions when what is in question is one concrete data structure in one relatively self-contained module seems to threaten to make this discussion even more disproportionate to the actual issues under consideration than it already is.

Here is what I would like to know. So the current situation is that if I round trip a VersionRange through VersionIntervals I lose carat information. This means that certain allow-newer syntax will not operate the same on the round-tripped versionrange compared to the original. What I would like to understand is when, if ever, a roundtripping of this sort takes place or is expected to take place. I.e. if we do the work to restore this property, when, if ever will it matter? If it matters quite a bit, then we probably should just push the work forward. If it is just something that we would like to have because it feels more the "right thing" then maybe a different approach is more appropriate.

phadej commented 2 years ago

if ever, a roundtripping of this sort takes place or is expected to take place. I.e. if we do the work to restore this property, when, if ever will it matter?

E.g. in cabal-fmt. Currently I have to use heuristics to try to recover carat versions. The heuristic works somehow because I assume how I write the version bounds (e.g. always using carat syntax when possible, except ...).

andreasabel commented 2 years ago

E.g. in cabal-fmt. Currently I have to use heuristics to try to recover carat versions.

I'd like to reiterate that it is mathematically impossible to get both of:

a canonical representation of intervals (that decides sematic equality using ==)
a reification of intervals that restores user-written caret syntax.

As a simple example, consider i1 = ^>= 1.3 || ^>= 1.4 which has canonical form i2 = >= 1.3 && < 1.5. These are semantically equivalent (in the boolean algebras of intervals), but clearly, i2 does not reify to i1.

If a more intensional (i.e., non-canonical) representation is wanted, e.g. for transforming contraints with caret-syntax to some (however defined) normal form, nothing stands in the way. Just not in Distribution.Types.VersionInterval which clearly is dedicated to a canonical representation implementing a boolean algebra. This API was continuously present from at least 2.2 until it got broken in 3.6.

N.B. The current documentation of Distribution.Types.VersionInterval still claims a canonical representation that helps to easily checks subsumption, even though the necessary functionality to live up to this promise got lost. https://github.com/haskell/cabal/blob/0abbe37187f708e0a5daac8d388167f72ca0db7e/Cabal-syntax/src/Distribution/Types/VersionInterval.hs#L54-L60

These are all reasons to restore the original functionality of Distribution.Types.VersionInterval (now living in the Cabal-syntax package) and to move the new implementation into a new module like Distribution.Types.VersionInterval.Intensional marked as WIP.

Ideally, the restoration would happen in a minor version of 3.6, so that some 3.6.x has the same API here as 2.2-3.4. However, as the module moved into a new package Cabal-syntax, I am not sure how the API continuity is handled there in general. @Mikolaj, what is the plan there?

Mikolaj commented 2 years ago

Perhaps let's chat at the meeting today?

phadej commented 2 years ago

@andreasabel If you require that intervals are of form \x -> lb <= x && x <= ub (with taking into account possible < variants and no-upper bound), then sure, it is "mathematically impossible" to preserve ^>=. However if you add ^>=-like interval as a primitive interval, then I'm quite sure you can (even such same intervals would be structurally equal).

I argue that just having \x -> lb <= x && x <= ub and claiming that that representation is canonical is wrong, as that representation doesn't preserve ^>= semantics.

The (my) canonical representation of ^>= 1.3 || ^>= 1.4 would be something like (>= 1.3 || <1.4) && ^>=1.4, still preserving the caret-upper bound, but realizing that ^>= 1.3 is effectively a >= 1.3 && <1.4 there, however ^>=1.4 is not >=1.4 && <1.5.

Another example would be ^>=1.3.1.0 || ^>=1.4.1.0, which is canonical: consider version 1.4:

it's not allowed by >=1.3.1.0 && <1.4 && >= 1.4.1.0 && <1.5
it should be allowed by allow-newer: ^pkg-name, as relaxing the ub of the first interval would include 1.4
yet caret relaxing >=1.3.1.0 && <1.4 && >= 1.4.1.0 && <1.5 won't allow 1.4, as caret relaxing won't change that interval.

BTW, @gbaz, another place where (IMO incorrect) normalization happens is hackage-server, resulting it showing

aeson (>=1.5.4.0 && <1.6 || >=2.0.0.0 && <2.1),

when author wrote

aeson ^>=1.5.4.0 || ^>=2.0.0.0

I'd argue it encourages to not use ^>=, but rather write aeson >=1.5.4.0 && <2.1 (which is "worse", by being larger interval, hoping for aeson-1.6 to never happen).

I suggest Cabal team to remove caret relaxing (and any other different interpretations of ^>= bounds) and state that it's a syntactic sugar and nothing else, then I'll shut up about different semantics of ^>=, as there wouldn't be any.

andreasabel commented 2 years ago

@phadej worte:

However if you add ^>=-like interval as a primitive interval, then I'm quite sure you can (even such same intervals would be structurally equal).

I'd say semantically (pertaining to the denotation) rather than structurally (pertaining to the representation/syntax) here.

Also I am using canonical in the sense that its Eq decides semantic equality. In other words: same normal form iff same semantics.

I don't doubt that you can make a non-canonical representation (preserving caret) which can be further quotiented towards a canonical representation. But this canonical representation will still be what now resides in the Legacy module.

Terminology aside, how did you e.g. implement version range subsumption in the presence of some allow-newer ingredient? I would imagine that the interpretation of a range would be parametrized on such ingredient, but the target would still be the canonical/semantic representation. So I am not sure why we should get rid of the semantic representation.

Anyway, is there document specifying the semantics of caret-ranges with or without allow-newer ingredients? And which tools currently implement it?

phadej commented 2 years ago

Also I am using canonical in the sense that its Eq decides semantic equality. In other words: same normal form iff same semantics.

I argue that is possible. The proof burden is on you to prove me wrong.

how did you e.g. implement version range subsumption in the presence of some allow-newer ingredient?

subsumes x y = x == union x y

And which tools currently implement it?

cabal-install

And I repeat: I suggest that you remove that from cabal-install, declaring ^>= is to be a syntactic sugar, removing allow-newer: ^pkg-name functionality.

EDIT: the whole MajorBoundVersion constructor could be removed, as there is little value of preserving syntax (Cabal doesn't preserve parentheses anymore, nor WildcardVersion. These weren't distinguished anywhere).

Mikolaj commented 2 years ago

@phadej: I'm all for removals, but how good is that, which we lose? I get that it's rarely used and poorly advertised, but would it improve much if used widely?

phadej commented 2 years ago

@Mikolaj I said already above that I'm skeptical it would. Not in plenty-of-average-users setup as Hackage is: Special semantics of ^>= is too fancy of a feature, as all back-and-forth between me and Abel demonstrate.

(I'll be sad to see that go, but I'm a very few of "power users" of cabal's constraint system).

Mikolaj commented 2 years ago

@phadej: thank you. Embarrassingly, I've never once used (or even read with understanding) the birdy beak operator. :)

@andreasabel: let's discuss on the meeting today the removal of this feature. If the needs, means or proportion of power-users change, we can bring it back.

fgaz commented 2 years ago

fwiw I remember there were plans of automatically (and/or on request) running matrix/hackage builders with --allow-newer=^pkg and proposing revisions to maintainers on successful builds/tests

gbaz commented 2 years ago

We can discuss more at the meeting, but I'd like to fix the documentation and encourage the correct use of caret. Removing an existing feature with a defined use case seems like a terrible idea. We can I'm sure work out a nice specification of version interval representations that preserves it.

Mikolaj commented 2 years ago

One way or the other. Just let's make sure we don't let it rot.

andreasabel commented 2 years ago

@phadej wrote:

I argue that is possible. The proof burden is on you to prove me wrong.

how did you e.g. implement version range subsumption in the presence of some allow-newer ingredient?
subsumes x y = x == union x y

So since the caret-operator has two different semantics depending on the value of allow-newer^, it is evident that subsumes cannot be both sound and complete. It will either say False when the reality is True in some cases and some settings of allow-newer^, or it will say True when the reality is False in some cases and some settings of allow-newer^.

Qed.

phadej commented 2 years ago

I don't see any contradiction.

EDIT: there are intervals which are not comparable.There's nothing wrong with that, subsumption of version ranges is a partial order even without caret intervals. (subsumes should say True iff it's True in all interpretations, that's is my point).

andreasabel commented 2 years ago

@phadej wrote:

there are intervals which are not comparable.There's nothing wrong with that, subsumption of version ranges is a partial order even without caret intervals.

(Of course, the subsumption order is a partial order, but we are not concerned about that. We are discussion whether subsumption is decidable.)

(subsumes should say True iff it's True in all interpretations, that's is my point).

Ok, gotcha! This means that with i1 = (^>= 3.1) and i2 = (>= 3.1 && < 3.2) the answer of i2 subsumes i1 will be False, but in the strict semantics (with no allow-newer) it is True. This is what I meant by it not being complete.

Let's call your subsumes Intensional.subsumes and the semantic (Legacy) one Extensional.subsumes, then we have that then Intensional one is sound, Intensional.subsumes implies Extensional.subsumes, but not complete (i.e. the opposite implication).

I'd argue that we still need the Extensional.subsumes for some applications, and up to 3.4 it was found in D.T.VI, and from 3.6 it is found in D.T.VI.Legacy. I'd also argue that Legacy isn't a good name and also this module should not be disposed of. I can concede to renaming this to D.T.VI.Semantic or D.T.VI.Extensional or D.T.VI.Canonical (rather than reverting back toD.T.VI which I originally proposed).

The current D.T.VI should then explain how it departs from the pre-3.6 version of itself and what its applications are (in particular concerning in handling the caret-operator).

I am curious still what the grammar/description of normal forms that preserves caret. Would you be able to give a precise description?

phadej commented 2 years ago

I'd argue that we still need the Extensional.subsumes for some applications

Which are?

andreasabel commented 2 years ago

How about stating properties of say the constraint solver (and QuickChecking them)? Or even stating the abovementioned property of the intensional model?

(Btw. Why didn't you state the properties of the new D.T.VI to verify your rewrite?)

Also, where do users of the old D.T.VI go? Wouldn't the proof burden of migrability be on the one who changes the API?

phadej commented 2 years ago

Wouldn't the proof burden of migrability be on the one who changes the API?

That's why I made a Legacy module. I was perfectly aware of e.g. missing intersectVersionIntervals, but I had plans to add it eventually (EDIT: or maybe not, as long as VersionRange -> VersionIntervals exist, it can be done "slow way") . So people who need it still (myself included) could use the Legacy version.

Why didn't you state the properties of the new D.I.VI

There are plenty of properties, and in fact one which is commented out:

https://github.com/haskell/cabal/blob/2f5dcf9775390ffe0aa1255c8e439c29c75058a8/Cabal-tests/tests/UnitTests/Distribution/Version.hs#L185-L189

is one I argue should hold (but doesn't).

transformCaretUpper is the function which implementeds allow-newer: ^pkg relaxation. It's added in Legacy change commit, but that functionality existed already in cabal-install codebase, just unnamed.

--- a/cabal-install/src/Distribution/Client/Dependency.hs
+++ b/cabal-install/src/Distribution/Client/Dependency.hs
@@ -513,18 +513,10 @@ relaxPackageDeps relKind (RelaxDepsSome depsToRelax0) gpd =

 -- | Internal helper for 'relaxPackageDeps'
 removeBound :: RelaxKind -> RelaxDepMod -> VersionRange -> VersionRange
-removeBound RelaxLower RelaxDepModNone = removeLowerBound
-removeBound RelaxUpper RelaxDepModNone = removeUpperBound
-removeBound relKind RelaxDepModCaret = hyloVersionRange embed projectVersionRange
-  where
-    embed (MajorBoundVersionF v) = caretTransformation v (majorUpperBound v)
-    embed vr                     = embedVersionRange vr
-
-    -- This function is the interesting part as it defines the meaning
-    -- of 'RelaxDepModCaret', i.e. to transform only @^>=@ constraints;
-    caretTransformation l u = case relKind of
-      RelaxUpper -> orLaterVersion l -- rewrite @^>= x.y.z@ into @>= x.y.z@
-      RelaxLower -> earlierVersion u -- rewrite @^>= x.y.z@ into @< x.(y+1)@
+removeBound RelaxLower RelaxDepModNone  = removeLowerBound
+removeBound RelaxUpper RelaxDepModNone  = removeUpperBound
+removeBound RelaxLower RelaxDepModCaret = transformCaretLower
+removeBound RelaxUpper RelaxDepModCaret = transformCaretUpper

andreasabel commented 2 years ago

On hackage-server (which as of today hasn't made any move to consider ^>= more than a short form of an interval >= ... <), we are now using the Legacy module (see https://github.com/haskell/hackage-server/pull/1038).

I think .Legacy should be undeprecated while there is no solution for the missing {intersect,union}VersionIntervals in Distribution.Types.VersionInterval.

gbaz commented 1 year ago

haskell / cabal

API `Distribution.Types.VersionInterval` dropped `intersectVersionIntervals` etc. in 3.6 #7916