Open andreasabel opened 2 years ago
The legacy intersectVersionIntervals
is wrong. It destroys the ^>=
semantics. Please, implement proper Intervals, which preserve the ^>=
specification: then everyone wins. The refactoring of existing intervals was done to make adding proper ^>=
intervals simpler. (There should be hints in the comments).
Until then, just use the Legacy
module. It's there exactly for the backward compat.
@phadej, I am trying to understand your comment:
The legacy
intersectVersionIntervals
is wrong. It destroys the^>=
semantics.
According to the docs, https://cabal.readthedocs.io/en/latest/cabal-package.html#pkg-field-build-depends , the semantics of ^>= x
is >= x && < x.1
and that of ^>= x.y...
is >= x.y... && < x.(y+1)
. Both the legacy and the new VersionIntervals
module implement this semantics.
Further, both modules use the same representation of intervals:
... increasing sequence of non-overlapping, non-empty intervals. ... canonical representation for the semantics of
VersionRange
s.
Canonical means that different VersionIntervals
have different semantics. Thus, the intervals need to be separated (i.e. non-overlapping and non-touching). Given that version intervals form a Boolean algebra the intersectVersionIntervals
can only be wrong when it is buggy. I checked but found no bugs.
Note that the requirements of correctness and canonicity leave no wiggle space. Ordered sequences of separated intervals are the and the only representation (there could be different, isormorphic ways of realizing ordered sequences or the individual intervals, but this does not matter here, and is not the case in the two implementation).
So I wonder what can be the difference between the old and the new implementation. Can you provide me evidence that points out the difference, an issue, a unit test, a regression test, an explanation, ...?
According to the docs, https://cabal.readthedocs.io/en/latest/cabal-package.html#pkg-field-build-depends ,
See https://cabal.readthedocs.io/en/3.6/cabal-project.html?highlight=allow-newer#cfg-field-allow-newer
The syntax also allows to prefix the dependee package with a modifier symbol to modify the scope/semantic of the relaxation transformation in a additional ways. Currently only one modifier symbol is defined, i.e. ^ (i.e. caret) which causes the relaxation to be applied only to ^>= operators and leave all other version operators untouched.
The ^>= x.y
and >= x.y && <x.(y+1)
are not the same. If you wish, latter is a hard bound, former is soft.
That is current state. I don't think that many would change soft bounds to hard when evidence arises that the "guess" was in fact correct. But to make the above bounds-specifications exactly the same you need to cleanup docs and remove all other details where they differ (i.e. ^
modifier in allow-newer
).
(Note: there was some debate whether cabal should have soft and hard bounds; it has now, but they aren't used. And to repeat, I don't think authors would revise soft bounds to hard ones when releases occur.)
If it's not clear it's really used, could we grep Hackage and verity the situation and, if only possible, RFC and simplify as much as possible?
If it's not clear it's really used,
^>=
is definitely used. But I'm 100% it's used as if it had hard semantics.
Whether anyone uses allow-newer: ^foo
, we cannot know. I doubt.
EDIT: that (soft vs. hard upper bound distinction) is a feature where small group of power users could use for great benefit (e.g. testing whether new release of a lib breaks everything or not, and then making hard revisions where needed). But Hackage is large group of non-power users: the feature is on the obscure side, not understood, not used "correctly".
Oh, I see. So we'd need a RFC, publicized on https://discourse.haskell.org, asking if anybody really needs it for anything, in particular for allow-newer
, given that removing it would simplify API and help tools that use the Cabal library and lessen cabal/tools/others maintenance burden. Who came up with the feature initially, so that we ask for a comment? @gbaz: what do you think?
Who came up with the feature initially
Soft-hard distinction? Herbert.
What the new module Distribution.Types.VersionInterval
concerns, I cannot see where it reflects the distinction between soft and hard constraints. As I already said, it interprets the caret-operator ^.>=
in exactly the same way as the "legacy" module: https://github.com/haskell/cabal/blob/ddb58fb8bf29cb065f1d85fa17b5c4f99b9d152a/Cabal/src/Distribution/Types/VersionInterval.hs#L96-L97
Also, it makes no efforts to recover ^>=
when translating back to VersionRange
: https://github.com/haskell/cabal/blob/ddb58fb8bf29cb065f1d85fa17b5c4f99b9d152a/Cabal/src/Distribution/Types/VersionInterval.hs#L294-L319
If one wanted to implement soft constraints, one would have to change/parametrize the translation from VersionRange
to VersionIntervals
, but at the semantic level, meaning VersionIntervals
, there is no soft/hard, it is simply a subset of the set of possible versions.
I do not have to repeat myself that a canonical representation of the semantics leaves not choices on the functionality of intersection/union etc., all implementation are extensionally equivalent.
I notice though that the new implementation regresses asymptotic complexity from O(n+m)
to O(nm)
.
Also, the new implementations are less perspicuous and harder to verify.
My propsal for Distribution.Types.VersionInterval
:
data
instead of tuple`) The discussion about soft/hard constraints is orthogonal to the OP and should be happening elsewhere.
As I already said, it interprets the caret-operator
Yes, it does. Because I hadn't time to do the further changes. I said that already.
Also, it makes no efforts to recover ^>= when translating back to VersionRange:
It cannot. The VersionInterval
type should preserve the ^>=
originated intervals as separate constructor.
If such constructor is added, then (IIRC) modifying intersectInterval
, unionUpper
, doesNotTouch
would be enough for rest of normalization to work. But I never got to make the final change.
Note how in soft semantics, foo: ^>=0.1 && <0.3
makes some sense. the bound behaves differently, depending if you don't allow-newer, allow-newer: foo
or allow-newer: ^foo
.
distinction could work, if say stackage was always run with allow-newer: ^*
, and the negative build results were propagated back as metadata revision. but to repeat myself, I don't think package maintainers would do that.
Since the new implementation did not add any functionality, but only prepared for functionality that might (or might not) come in the future, I think it should have stayed a PR or branch rather than being merged and released.
As I do not think that new functionality is coming soon, since it needs a proper proposal, discussion etc., I proposed to revert to the old, more efficient and perspicuous algorithms for now and save your reimplementation in some branch, so that it can be recovered when it is needed.
branch rather than being merged and released.
Fair, but then it would be very easy to veto almost any change to Cabal
.
more efficient
I benchmarked, the new ones are not inefficient.
Instead of going forward, you propose to go backwards. I'm more and more disappointed in the direction Cabal
development turns into.
Btw, I drop out of Cabal
development because pushing forward alone burnt me out. Even not participating actively keeps beating me, as any small steps I managed to do are proposed to be reverted. Thanks a lot.
@phadej: I'm sorry you take it that way.
Cabal is in a tough time, for various reasons, some cyclical, some unique, such as Covid and crossing the critical mass of popularity and of complexity, etc. A compounding factor of this crisis, but also its result, is the loss of Cabal's most passionate maintainers and very limited input from many of its old contributors. IMHO, we now struggle to minimise losses until we can rebound, so we are not at liberty to choose a "direction" in any fundamental way.
Regarding features, especially at times of crisis, IMHO every feature should be questioned and proposed for a removal, whether it's fully implemented or not. The RFC process is to ensure that we either start understanding the purpose and commit to maintaining/finishing the feature, or we see that at this time there is not enough interest/capacity and we should cut our loses. It can always be brought back later and it will happen naturally if there is a strong need for it.
Please correct me, if I got anything wrong. E.g., if the issue we are discussion is not a feature, but a proposal for a simplification, refactoring, a change that will enable deleting other things or delegating a lot of functionality to other tools. That would be a completely different situation and I'd understand your worry about it being "easy to veto almost any change".
I think that this module is a relatively minor component of cabal stuff, all told. Invoking big picture questions when what is in question is one concrete data structure in one relatively self-contained module seems to threaten to make this discussion even more disproportionate to the actual issues under consideration than it already is.
Here is what I would like to know. So the current situation is that if I round trip a VersionRange through VersionIntervals I lose carat information. This means that certain allow-newer syntax will not operate the same on the round-tripped versionrange compared to the original. What I would like to understand is when, if ever, a roundtripping of this sort takes place or is expected to take place. I.e. if we do the work to restore this property, when, if ever will it matter? If it matters quite a bit, then we probably should just push the work forward. If it is just something that we would like to have because it feels more the "right thing" then maybe a different approach is more appropriate.
if ever, a roundtripping of this sort takes place or is expected to take place. I.e. if we do the work to restore this property, when, if ever will it matter?
E.g. in cabal-fmt
. Currently I have to use heuristics to try to recover carat versions. The heuristic works somehow because I assume how I write the version bounds (e.g. always using carat syntax when possible, except ...).
E.g. in
cabal-fmt
. Currently I have to use heuristics to try to recover carat versions.
I'd like to reiterate that it is mathematically impossible to get both of:
==
)As a simple example, consider i1 = ^>= 1.3 || ^>= 1.4
which has canonical form i2 = >= 1.3 && < 1.5
. These are semantically equivalent (in the boolean algebras of intervals), but clearly, i2
does not reify to i1
.
If a more intensional (i.e., non-canonical) representation is wanted, e.g. for transforming contraints with caret-syntax to some (however defined) normal form, nothing stands in the way. Just not in Distribution.Types.VersionInterval
which clearly is dedicated to a canonical representation implementing a boolean algebra. This API was continuously present from at least 2.2 until it got broken in 3.6.
N.B. The current documentation of Distribution.Types.VersionInterval
still claims a canonical representation that helps to easily checks subsumption, even though the necessary functionality to live up to this promise got lost.
https://github.com/haskell/cabal/blob/0abbe37187f708e0a5daac8d388167f72ca0db7e/Cabal-syntax/src/Distribution/Types/VersionInterval.hs#L54-L60
These are all reasons to restore the original functionality of Distribution.Types.VersionInterval
(now living in the Cabal-syntax package) and to move the new implementation into a new module like Distribution.Types.VersionInterval.Intensional
marked as WIP.
Ideally, the restoration would happen in a minor version of 3.6, so that some 3.6.x has the same API here as 2.2-3.4. However, as the module moved into a new package Cabal-syntax, I am not sure how the API continuity is handled there in general. @Mikolaj, what is the plan there?
Perhaps let's chat at the meeting today?
@andreasabel If you require that intervals are of form \x -> lb <= x && x <= ub
(with taking into account possible <
variants and no-upper bound), then sure, it is "mathematically impossible" to preserve ^>=
. However if you add ^>=
-like interval as a primitive interval, then I'm quite sure you can (even such same intervals would be structurally equal).
I argue that just having \x -> lb <= x && x <= ub
and claiming that that representation is canonical is wrong, as that representation doesn't preserve ^>=
semantics.
The (my) canonical representation of ^>= 1.3 || ^>= 1.4
would be something like (>= 1.3 || <1.4) && ^>=1.4
, still preserving the caret-upper bound, but realizing that ^>= 1.3
is effectively a >= 1.3 && <1.4
there, however ^>=1.4
is not >=1.4 && <1.5
.
Another example would be ^>=1.3.1.0 || ^>=1.4.1.0
, which is canonical: consider version 1.4
:
>=1.3.1.0 && <1.4 && >= 1.4.1.0 && <1.5
allow-newer: ^pkg-name
, as relaxing the ub of the first interval would include 1.4
>=1.3.1.0 && <1.4 && >= 1.4.1.0 && <1.5
won't allow 1.4
, as caret relaxing won't change that interval.BTW, @gbaz, another place where (IMO incorrect) normalization happens is hackage-server
, resulting it showing
aeson (>=1.5.4.0 && <1.6 || >=2.0.0.0 && <2.1),
when author wrote
aeson ^>=1.5.4.0 || ^>=2.0.0.0
I'd argue it encourages to not use ^>=
, but rather write aeson >=1.5.4.0 && <2.1
(which is "worse", by being larger interval, hoping for aeson-1.6
to never happen).
I suggest Cabal
team to remove caret relaxing (and any other different interpretations of ^>=
bounds) and state that it's a syntactic sugar and nothing else, then I'll shut up about different semantics of ^>=
, as there wouldn't be any.
@phadej worte:
However if you add
^>=
-like interval as a primitive interval, then I'm quite sure you can (even such same intervals would be structurally equal).
I'd say semantically (pertaining to the denotation) rather than structurally (pertaining to the representation/syntax) here.
Also I am using canonical in the sense that its Eq
decides semantic equality. In other words: same normal form iff same semantics.
I don't doubt that you can make a non-canonical representation (preserving caret) which can be further quotiented towards a canonical representation. But this canonical representation will still be what now resides in the Legacy
module.
Terminology aside, how did you e.g. implement version range subsumption in the presence of some allow-newer
ingredient? I would imagine that the interpretation of a range would be parametrized on such ingredient, but the target would still be the canonical/semantic representation. So I am not sure why we should get rid of the semantic representation.
Anyway, is there document specifying the semantics of caret-ranges with or without allow-newer
ingredients? And which tools currently implement it?
Also I am using canonical in the sense that its Eq decides semantic equality. In other words: same normal form iff same semantics.
I argue that is possible. The proof burden is on you to prove me wrong.
how did you e.g. implement version range subsumption in the presence of some allow-newer ingredient?
subsumes x y = x == union x y
And which tools currently implement it?
cabal-install
And I repeat: I suggest that you remove that from cabal-install
, declaring ^>=
is to be a syntactic sugar, removing allow-newer: ^pkg-name
functionality.
EDIT: the whole MajorBoundVersion
constructor could be removed, as there is little value of preserving syntax (Cabal
doesn't preserve parentheses anymore, nor WildcardVersion
. These weren't distinguished anywhere).
@phadej: I'm all for removals, but how good is that, which we lose? I get that it's rarely used and poorly advertised, but would it improve much if used widely?
@Mikolaj I said already above that I'm skeptical it would. Not in plenty-of-average-users setup as Hackage is: Special semantics of ^>=
is too fancy of a feature, as all back-and-forth between me and Abel demonstrate.
(I'll be sad to see that go, but I'm a very few of "power users" of cabal's constraint system).
@phadej: thank you. Embarrassingly, I've never once used (or even read with understanding) the birdy beak operator. :)
@andreasabel: let's discuss on the meeting today the removal of this feature. If the needs, means or proportion of power-users change, we can bring it back.
fwiw I remember there were plans of automatically (and/or on request) running matrix/hackage builders with --allow-newer=^pkg
and proposing revisions to maintainers on successful builds/tests
We can discuss more at the meeting, but I'd like to fix the documentation and encourage the correct use of caret. Removing an existing feature with a defined use case seems like a terrible idea. We can I'm sure work out a nice specification of version interval representations that preserves it.
One way or the other. Just let's make sure we don't let it rot.
@phadej wrote:
I argue that is possible. The proof burden is on you to prove me wrong.
how did you e.g. implement version range subsumption in the presence of some allow-newer ingredient?
subsumes x y = x == union x y
So since the caret-operator has two different semantics depending on the value of allow-newer^
, it is evident that subsumes
cannot be both sound and complete. It will either say False
when the reality is True
in some cases and some settings of allow-newer^
, or it will say True
when the reality is False
in some cases and some settings of allow-newer^
.
Qed.
I don't see any contradiction.
EDIT: there are intervals which are not comparable.There's nothing wrong with that, subsumption of version ranges is a partial order even without caret intervals. (subsumes
should say True
iff it's True
in all interpretations, that's is my point).
@phadej wrote:
there are intervals which are not comparable.There's nothing wrong with that, subsumption of version ranges is a partial order even without caret intervals.
(Of course, the subsumption order is a partial order, but we are not concerned about that. We are discussion whether subsumption is decidable.)
(
subsumes
should sayTrue
iff it'sTrue
in all interpretations, that's is my point).
Ok, gotcha! This means that with i1 = (^>= 3.1)
and i2 = (>= 3.1 && < 3.2)
the answer of i2 subsumes i1
will be False
, but in the strict semantics (with no allow-newer
) it is True
. This is what I meant by it not being complete.
Let's call your subsumes
Intensional.subsumes
and the semantic (Legacy
) one Extensional.subsumes
, then we have that then Intensional
one is sound, Intensional.subsumes
implies Extensional.subsumes
, but not complete (i.e. the opposite implication).
I'd argue that we still need the Extensional.subsumes
for some applications, and up to 3.4 it was found in D.T.VI
, and from 3.6 it is found in D.T.VI.Legacy
. I'd also argue that Legacy
isn't a good name and also this module should not be disposed of. I can concede to renaming this to D.T.VI.Semantic
or D.T.VI.Extensional
or D.T.VI.Canonical
(rather than reverting back toD.T.VI
which I originally proposed).
The current D.T.VI
should then explain how it departs from the pre-3.6 version of itself and what its applications are (in particular concerning in handling the caret-operator).
I am curious still what the grammar/description of normal forms that preserves caret. Would you be able to give a precise description?
I'd argue that we still need the Extensional.subsumes for some applications
Which are?
How about stating properties of say the constraint solver (and QuickChecking them)? Or even stating the abovementioned property of the intensional model?
(Btw. Why didn't you state the properties of the new D.T.VI
to verify your rewrite?)
Also, where do users of the old D.T.VI
go? Wouldn't the proof burden of migrability be on the one who changes the API?
Wouldn't the proof burden of migrability be on the one who changes the API?
That's why I made a Legacy module. I was perfectly aware of e.g. missing intersectVersionIntervals
, but I had plans to add it eventually (EDIT: or maybe not, as long as VersionRange
-> VersionIntervals
exist, it can be done "slow way") . So people who need it still (myself included) could use the Legacy
version.
Why didn't you state the properties of the new D.I.VI
There are plenty of properties, and in fact one which is commented out:
is one I argue should hold (but doesn't).
transformCaretUpper
is the function which implementeds allow-newer: ^pkg
relaxation. It's added in Legacy change commit, but that functionality existed already in cabal-install
codebase, just unnamed.
--- a/cabal-install/src/Distribution/Client/Dependency.hs
+++ b/cabal-install/src/Distribution/Client/Dependency.hs
@@ -513,18 +513,10 @@ relaxPackageDeps relKind (RelaxDepsSome depsToRelax0) gpd =
-- | Internal helper for 'relaxPackageDeps'
removeBound :: RelaxKind -> RelaxDepMod -> VersionRange -> VersionRange
-removeBound RelaxLower RelaxDepModNone = removeLowerBound
-removeBound RelaxUpper RelaxDepModNone = removeUpperBound
-removeBound relKind RelaxDepModCaret = hyloVersionRange embed projectVersionRange
- where
- embed (MajorBoundVersionF v) = caretTransformation v (majorUpperBound v)
- embed vr = embedVersionRange vr
-
- -- This function is the interesting part as it defines the meaning
- -- of 'RelaxDepModCaret', i.e. to transform only @^>=@ constraints;
- caretTransformation l u = case relKind of
- RelaxUpper -> orLaterVersion l -- rewrite @^>= x.y.z@ into @>= x.y.z@
- RelaxLower -> earlierVersion u -- rewrite @^>= x.y.z@ into @< x.(y+1)@
+removeBound RelaxLower RelaxDepModNone = removeLowerBound
+removeBound RelaxUpper RelaxDepModNone = removeUpperBound
+removeBound RelaxLower RelaxDepModCaret = transformCaretLower
+removeBound RelaxUpper RelaxDepModCaret = transformCaretUpper
On hackage-server
(which as of today hasn't made any move to consider ^>=
more than a short form of an interval >= ... <
), we are now using the Legacy
module (see https://github.com/haskell/hackage-server/pull/1038).
I think .Legacy
should be undeprecated while there is no solution for the missing {intersect,union}VersionIntervals
in Distribution.Types.VersionInterval
.
Background: I am looking to implement a check whether some version constraint subsumes another one, for https://github.com/hackage-trustees/hackage-cli/issues/28. We know that
a subsumes b
can e.g. be implemented as(a intersect b) = b
.I notice that up to 3.4, there was
intersectVersionIntervals
, and I traced this back to 2.4 (maybe even older). In 3.6 this function is gone, apparently due to a rewrite: https://github.com/haskell/cabal/blob/96ea35dca786dd54d64e55a00b0da7f63f7f6e99/Cabal/src/Distribution/Types/VersionInterval.hs#L3-L6It seems that the whole module
Distribution.Types.VersionInterval
was moved toDistribution.Types.VersionInterval.Legacy
and the former module was reimplemented from scratch, but supplying only parts of the former API. The legacy module is announced to be deleted in 3.8.I propose to restore the deleted functionality to
Distribution.Types.VersionInterval
, at least the mathematically essential functions. Basically, all the algebraic operations present for the "syntactic" formVersionRange
should be implemented for the "semantic" formVersionIntervals
as well; both are Boolean algebras.(For my use in
hackage-cli
that would mean I could now support 2.4, then upgrade to 3.4, then to 3.8---needing to skip 3.6.)