Capture implicit constraints explicitly

haskell / cabal

Official upstream development repository for Cabal and cabal-install

https://haskell.org/cabal

Other

1.62k stars 697 forks source link

Capture implicit constraints explicitly #3729

Open alanz opened 8 years ago

alanz commented 8 years ago

Motivation. When specifying version bounds for software published to Hackage, there is a tension between specifying a conservative upper bound < 1.7 (because 1.7 has not been released yet, and COULD break your package) and not specifying an upper bound (because it's possible your software MIGHT work with 1.7). In an ideal world, we would have two separate types of bounds:

Explicit bounds, which are known at the time a package is authored and uploaded. For example, if I say p >= 0.2, it's because I definitely know that p-0.1 doesn't build with my package
Implicit bounds, which are unknown at the time a package is authored and uploaded. Continuing the previous example, if p-0.3 is not released at the time I upload my package, I do not know if I should apply the bound p < 0.3 to my package. If I am being conservative, I should, because p-0.3 could be BC-breaking in a way I care about; but p-0.3 could work fine, in which case my upper bound might actually be p < 0.4 or p < 2.0 (who knows!)

The present requested behavior from package uploaders is to upload a package with a restrictive upper bound, and then relax it (using the Cabal file editing interface on Hackage). But this is a lot of work for maintainers, and this work doesn't even benefit Stackage users, who aren't using the dependency solver at all. In fact, the speculative bound is a pain because it will cause Setup configure to reject a dependency, even when it might have worked.

Approach. Implicit bounds required by the PVP should be tracked and recorded by Hackage, rather than being written to Cabal files. Package authors no longer have to write speculative upper bounds; Hackage will automatically pick an appropriate upper bound, and there will be a channel for successively relaxing it.

Goals. Here's what we want to achieve:

The application of implicit bounds to a package should be backwards-compatible, in the sense that all existing cabal-install clients should get these new constraints. (This imposes some major restrictions on how we implement this.)
The implicit constraints are able to be adjusted easily, possibly, automatically. For example, if a Stackage nightly shows that a given implicit version bound can be moved up a notch, then this can happen via a webhook or other technical measure.

Details. This issue proposes to explicitly capture the implicit constraints, outside of the cabal file.

The simplest way to do this, keeping existing functionality intact but allowing the implicit constraints to move out of the cabal file, would be capture them on hackage, by modifying the meta-data editing capability. This would allow either the package uploaders or hackage trustees to modify the implicit bounds, which are stored outside the cabal file, and the implicit ones, which are now stored in a new file, associated with the package/version.

It would make sense for the initial implicit constraints to be optionally set by the uploader, perhaps using something like cabal-bounds. Alternatively they can be captured in a new file, parallel to the existing one. As they are implicit, this should not be necessary.

The hackage web interface will show both constraints on the dependencies, but highlighting in some way to differentiate them.

When the 00-index.tar.gz file is constructed for download, it should contain the original cabal file for each package, as well as a new file containing the current values for the explicit constraints.

Under normal usage, cabal install will combine the constraints in the cabal file with the new implicit ones (perhaps using the config.cabal mechanism), and configure as per normal.

When another installer such as stack operates, it can choose to not use the implicit constraints if it is using its own constraints, or to use them when it needs to invoke the cabal/hackage resolver.

Other notes. There is also a proposal for package sets. They appear to be orthogonal to this proposal.

ezyang commented 8 years ago

I think package sets are clearly orthogonal. The information you want to collect here is something that would feed into the creation of a package set, but it's not intrinsically specific to a particular package collection. In particular negative collections aren't sufficient to say "for p-1.2, you can't use q > 0.3".
One of the technical problems with Hackage is that the index grows monotonically; if we're editing bounds this growth is exacerbated. Did you talk about any ways to handle this?
"When another installer such as stack operates, it can choose to not use the implicit constraints if it is using its own constraints, or to use them when it needs to invoke the cabal/hackage resolver." I think there may need to be some changes to Cabal library, so that it can be told to ignore bounds that are written in the package description and just use the versions I told you about (via a --dependency flag).
What is the desired workflow for updating bounds? Say I'm a package author. What am I supposed to do, and when?

alanz commented 8 years ago

Note on terminology

<sclv> explicit (author-provided), implicit (pvp-implied),
          and improved (evidence driven revision of implicit)

phadej commented 8 years ago

I had an idea where one could specify already in a .cabal file, "hard" (known for sure), and "soft" (educated guess, based on the semantics of PVP) bounds.

In that scenario, e.g. stack or/and Stackage could omit soft bounds on their will. It would be interesting experiment indeed!

The problem is that one would need to harden soft bounds, when there's evidence they were actually correct guesses. I'm maybe cynical, but I doubt that authors who leave the bounds today, will harden soft (or make explicit) tomorrow. So in that sense package external metadata would work better.

However, I'd propose a less intrusive change (I still want upper bounds to be specified). Let the external-metadata say which bound is actually soft. Then anyone could use the freedom of not taking them into account if they are on the way. This is more conservative way, if you don't know how to use external metadata, you are on the safe side still, and no revisions are needed. (maybe that's what @alanz proposed, not sure, what cabal-bounds refers to). Obviously that needs tools support in cabal to make that somehow manageable. But I really want that I see only .cabal file is the safe approach. External information would add possibilities, not remove them.

There are technical problems.

I'd suggest that external metadata would be an "file with cabal-file structure which amends the contents of build-depends in either conjuctive (restricting, not so useful) or disjunctive (relaxing) way.
that would help to solve "how to reference build-depends in a component foo under conditional if flag(bar) and if os(windows)

There been also other ideas to include external-metadata. E.g. provides -proposal could be relatively simply superseded by a Hackage-wide metadata of conflicting packages. That however would take the power away from the authors. That's contrary to previous "external data adds options", but in this case it does add information which cannot live in the .cabal files. So maybe provides: is still superior approach.

In the long run, when it would be possible to sign the uploads to Hackage, one would need to sign metadata updates too. That additional security comes with additional costs. E.g. one couldn't make a revision online anymore.

alanz commented 8 years ago

@ezyang thanks for the questions, I think they can only be addressed properly when we have a better idea of the appropriate approach, and where the constraints should be captured. They will obviously shape the continuing discussion too.

@phadej thanks for the feedback.

To me the thorniest problem is getting something like this to simultaneously work with stack and the legacy cabal-install's, which are spread all over the place.

That said, I think the change and deployment process in stack is a lot faster, so perhaps the proposed identification of soft constraints via a separate file could be done more easily there.

I am in favour of a very clear separation between explicit and implicit constraints though, and believe that the explicit ones belong to the developer, and as such should be in the developer's version of the cabal file, uploaded to hackage. The implicit ones then belong outside the cabal file. Or outside the cabal file uploaded to hackage, they may possibly re-appear in the one downloaded from hackage, in the 00-index.tar.gz file.

If the versions bounds are important for a conditional branch in the cabal file, and differ from the implicit ones derived via the proposed new process, then they should become explicit constraints.

Something that is not clear to me is exactly when the constraints from the cabal file in 00-index.tar.gz get used, and when the ones in the package tarball get used.

I presume the solver uses the ones in the index (together with the list of currently installed packages, when not using new-build), but that when one of the dependency packages gets built the cabal configure step will use the one in the tarball (as uploaded to hackage by the developer).

This can potentially allow the indexed cabal files to have the constraints include the implicit ones.

In this case, since the package local cabal file is less constrained than the one downloaded, but preference is given to currently installed packages (is it?), the configure step should still succeed.

Since the implicit constraints can now be managed more freely and holistically (not necessarily requiring a new package upload by a developer), it is more likely that at any given time they will match the surrounding ecosystem, and so build plans will be able to be constructed.

A side effect of managing the implicit constraints on hackage is that multiple sources of information can be combined to semi-automatically update version bounds. This can include e.g. a webhook from other constraint management systems such as stackage to provide evidence that a given package can be successfully used with a specific version of a dependency, later than the current implicit upper bound for the package.

ezyang commented 8 years ago

Something that is not clear to me is exactly when the constraints from the cabal file in 00-index.tar.gz get used, and when the ones in the package tarball get used.

The index cabal file is always used. The solver reads out the info from the index, and before we run the setup script we overwrite the cabal file in the unpacked source dir with the index cabal file.

But I don't understand what you say next, because implicit constraints are still constraints, and so they will get fulfilled regardlessly?

alanz commented 8 years ago

@ezyang the point I am driving at is the separate management of constraints, while preserving backward compatibility.

Hence, if I understand things correctly, we can have the dev always work with a cabal file having hard constraints only. When this gets uploaded to hackage, and the index file gets downloaded by a user of that package, the hackage server can add the current best known implicit constraints to the index file, which will then be used on the local machine.

Further, according to experiments and discussion on IRC, a given package dependency list in a cabal file can have the same package listed more than once, and the solver will use the intersection of the constraints.

So we can relatively easily add a section to a cabal file with implicit constrains, clearly marked as such.

e.g.

executable foobazbar
  main-is:             Main.hs
  build-depends:
                ghc-exactprint >= 0.5.2.1

               -- START OF HACKAGE PROVIDED IMPLICIT CONSTRAINTS
               , ghc-exactprint  < 0.5.3
               -- END OF HACKAGE PROVIDED IMPLICIT CONSTRAINTS

  default-language:    Haskell2010

alanz commented 8 years ago

These are my planned steps, subject to change based on feedback from others or implementation

[ ] Method to insert hackage-managed implicit constraints into a cabal file, per section, clearly delimited by comments.
[ ] Add new data structures to hackage-server to capture the implicit constraints. These will be indexed by package/version/component
[ ] Add a process to insert the implicit constraints into each cabal file when it is modified, or to sweep over all cabal files and generate the inserted ones as a batch.
[ ] Update the 00-index.tar.gz generation process to use the modified cabal files
[ ] Modify the hackage frontend to display the explicit and implicit constraints, highlighting the differences
[ ] Modify the package maintainers package to allow explicitly editing the implicit constraints
[ ] Provide a mechanism for a developer to access the current implicit bounds for their package, while working on their code. Perhaps something as simple as a config.cabal or equivalent file.

Phase 2

Provide a wider mechanism to update the constraints, based on different evidence sources.

phadej commented 8 years ago

Goals

The implicit constraints are able to be adjusted easily, possibly, automatically. For example, if a Stackage nightly shows that a given implicit version bound can be moved up a notch, then this can happen via a webhook or other technical measure.

What if there existed a service, outside of Hackage and cabal-install, which would build packages which have restrictive upper bounds, but relaxing them, and then offering a maintainer a button to make a revision while

showing the changelog of new version restricted package
link to the diff (like http://hdiff.luite.com/cgit/http-api-data/diff?id=0.2.3&id2=0.2.4)

cc @hvr

I'm really :-1: on complicating where from bounds come. If only possible, everything should be in the .cabal file.

In the modified proposal I'm :-1: :-1: :-1: on * via a webhook or other technical measure*, major bumps occur for a reason. There was semantic checks not caught by compilation errors (e.g. deepseq-1.4).

In this light, if the service building packages with --allow-newer (and learning which bounds are non-speculative) does solve the same problem, doesn't it?

alanz commented 8 years ago

@phadej I think the starting point for any "webhook or technical measure" would be that it is a hackage management decision, firstly. And that the criteria for doing any kind of update would have to be analysed carefully before allowing this.

The main point is to have an evidence based approach. If a build process which has a certain level of quality in terms of ensuring that things actually work (such as stackage) is the origin, than a data point that says "package p-0.1.0.0 is known to work with q-0.2.3.0, we can perhaps raise its implicit upper bound from q <= 0.2.2 to q <= 0.2.3"

hvr commented 8 years ago

@alanz

First of all, I disagree with the premise that

The present requested behavior from package uploaders is to upload a package with a restrictive upper bound, and then relax it (using the Cabal file editing interface on Hackage). But this is a lot of work for maintainers

It is not a lot of work, especially if we improve the tooling a bit to the point where this is doable via a convenient interface (@phadej's "press the button" idea). But relaxing upper bounds w/o a human signing consciously off on it makes no sense, as then we wouldn't need semantic versioning (or version numbers for that matter -- something like backpack would fully suffice to describe API compatibility -- but it clearly can't) in the first place.

The package maintainer is usually the person most qualified (after having inspected the necessary data, i.e. changelog/build reports/taking notice of compile warnings)/test-results etc...) to make an informed choice about whether it's safe to bump an upper bound to allow the dependency to be upgraded.

For that matter, Hackage Trustees almost never dare to relax upper bounds for packages they don't know well enough unless it's a critical and the maintainer is not reachable. And that's not going to change even with your proposed new architecture.

That being said, how do we ensure that package authors keep caring about maintaining the meta-data on Hackage rather than starting to neglect it, because "it works for stack users, why should I care about cabal users?". If we make it too easy for Stackage to ignore Hackage's PVP-regime I'm worried this will slowly leave Hackage in a worse situation than it is now, where authors are currently forced to take into account upper bounds, specifically because Stackage needs to honour them currently. In your proposed scheme however, it appears like it's made more easier for Stackage to elude PVP bounds, and consequently for authors to neglect active bounds management on Hackage.

Or put differently, official Stackage snapshots would be able to represent solutions which would not be deducible from the version constraints that Hackage clients need to honour. I.e. Stackage snapshots would represent invalid install-plans according to Hackage's constraint meta-data. That situation seems fundamentally wrong to me. In my view, any Stackage snapshot ought to be a valid frozen install-plan solution compatible with the constraints of Hackage's state of a certain point in time.

alanz commented 8 years ago

Started, see https://github.com/alanz/cabal/commit/4c8b5fd9ddc4375495ee60026a5ae42df905625c

alanz commented 8 years ago

@hvr

Answering from a social dimension. Which can only be a guess.

I do not believe anyone is against the concept of semantic versioning, as captured in the PVP. I believe that the social problem comes about where the requirements of one package constraint management system (hackage) cause a burden on people who are using a different package management system (stack). Even though both use hackage underneath.

Further, I believe that the fundamental premise of relying on human diligence to maintain upper bounds is flawed, without out some kind of (preferably machine-checked) verification process. This is an aside though.

The goals of this issue are actually quite limited. What I would like to see is that effectively nothing changes from what we have now, except that there is a mechanism to allow implicit constraints to be managed at hackage, for those package maintainers that decide to use this. If someone wants to manage upper bounds in their cabal files, they can.

But it allows cabal-install users to ALSO get the benefit of restrictive upper bounds, where a developer who is primarily using stack does not see the need to apply upper bounds in their cabal files.

And it clears the way for possible experimentation at the ecosystem level with how to manage the upper bounds.

phadej commented 8 years ago

I'd like to find a way to conduct this experiment without changing anything in Cabal library.
... and without altering the Hackage software

We could to experiment on Hackage only, by putting original build-depends into commented blocks like

-- BOUNDS APPLIED:
build-depends: transformers >= 0.4 && <0.6
-- AUTHOR ORIGINAL:
-- build-depends: transformers >= 0.4

To my understanding we have some support for this already with gen-bounds. So only slight changes would be required to cabal-install (or stack if people use it to upload).

Then any tooling could use this information on their will. E.g. stackage-curator and stack. They already provide a copy of index thru all-cabal-files, and stack generates 00-index.tar based on it. I guess stack could generate 00-index.tar.g using author original build-depends as well (in the case that bounds added at the upload are incorrect).

If (and when!) the functionality will be used in Stackage tooling, I as Hackage Trustee will develop the tooling to import positive evidence (build successes, bounds can be relaxed) into Hackage as well.

I personally will be happy to participate in experiment specified as above, I could actually leave out all of the upper bounds, as I usually make packages support all the newest versions of dependencies at the release moment. I.e. even nobody would use "the author specified" information, it might save me a bit of hacking time.

alanz commented 8 years ago

@phadej

I do not believe that any change is required at the client or Stackage side, since effectively nothing is changing, they will still get an index file with cabal files with the requisite dependencies in them.

What could be possible as an experiment, is for the Hackage Trustees to make the kind of edits you propose to packages that agree to participate, and let them propagate through.

bergmark commented 8 years ago

We could to experiment on Hackage only, by putting original build-depends into commented blocks

Note that you can also have x- prefixed custom fields, that way we'd get at least a little bit of structure enforced.

alanz commented 8 years ago

@bergmark I think it is about experimentation at the constraint management level, inside hackage and in terms of what the devs upload. In practical terms, nothing should have to change in terms of how the information is propagated from hackage to the various clients out there, including stackage.

alanz commented 8 years ago

FWIW, although the commit I made was to Cabal, the required functions are already exported from it, so the code can live anywhere. Making it clear that is has no impact on the client.

23Skidoo commented 6 years ago

@hvr, now that we have ^>=, what's the updated status of this ticket?