haskellfoundation / tech-proposals

The Haskell Foundation Tech Proposal Process
Other
69 stars 29 forks source link

Introduce GHC.X.hackage proposal #27

Closed bgamari closed 2 years ago

bgamari commented 2 years ago

For a few years now the GHC team has maintained head.hackage a set of patches to a subset of Hackage allowing Hackage packages to be built against pre-release compilers. While head.hackage has been an invaluable resource for testing GHC, its user-base has been limited to GHC developers.

In the past few months, there have been a few proposals seeking to lower the cost of migrations to new GHC releases. A common theme in these discussions has been the long lead-time for adoption of new releases due to the largely serial nature of library migration. We propose to extend head.hackage as a tool to help users migrate and test their libraries.

Rendered version

nomeata commented 2 years ago

https://github.com/bgamari/tech-proposals/blob/ghc-x-hackage/proposals/001-ghc-x-hackage.md might be a rendered link

nomeata commented 2 years ago

Thanks for writing this up!

The proposal envisions one overlay per major GHC version. But a major GHC version may not be the only case where a large scale migration needs to happen - if a library far down in the dependency tree (or up) changes in ways that need adjustments in many libraries, it seems we are in a similar situation. Could we have a hackage overlays for these as well?

Thinking in the other direction: is having multiple overlays the right call? I see the advantage of not worrying about other GHC versions, and the implicit garbage collection, but there is also redundancy (likely the patch for GHC-9.14 is also needed for GHC-9.16). How would a single “compat patches overlay” repo fare?

tomjaguarpaw commented 2 years ago

Sorry to be a petty bureaucrat but What's the process for submitting a HFTP? says

Before submitting a HFTP, it is required that you ... Discuss your idea on the Haskell.org Discourse. ... Create a Discourse topic under the category "Haskell Foundation", that starts with "Pre-HFTP” ... Proposing your ideas on the Discourse is not an optional step

I don't see such a discussion on the Haskell Discourse. If I have missed it could you please link it? If not, could you please start the preliminary discussion there?

bgamari commented 2 years ago

I have started a discussion here.

TeofilC commented 2 years ago

This proposal looks great. I'd be keen to help out once the ball gets rolling.

I have a couple of questions/suggestions:

  1. Keeping track of compatible packages I feel like it would be quite helpful to have a website that shows which packages from hackage build with the help of the overlay. If a package can't be built, it could show the error or a list of failing dependencies. This would differ from head.hackage's current behaviour as it only tries to build patched packages or those explicitly mentioned, rather than all of hackage. I think right now it's hard to tell which packages are actually compatible with new GHCs other than by trying to build them yourself. And increasing visibility might lead to more patches, and a better idea of ecosystem compatibility. The release of the new stackage nightly that uses GHC-9.2 seemed to have this effect. It became clearer which packages were failing and led to a flurry of patches.

  2. Stack compatibility As far as I can tell, the proposal doesn't mention compatibility with stack, and I don't think stack supports a similar additional hackage facility like cabal. Do you have plans for stack support? I think one way this could work is to make available a patched version of the latest lts and/or nightly snapshots.

gbaz commented 2 years ago

To update: we met today and we look on this favorably. As it appears active work on fleshing out implementation is being undertaken by the stability working group, we'll hold off on a formal approval until everything shapes up.

Ericson2314 commented 2 years ago

In the (very long) discussion thread in https://github.com/haskell/core-libraries-committee/issues/22 it came up that doing library impact assessments today is also difficult.

I would like to see GHC.X.hackage also help with that, In particular, it should be very easy to make changes base on a a branch, make neccesary changes to the corresponding hackage overlay on a branch, and kick off a "rebuild the world" integration test with those two branches.

In particular, a proposed litmus test for pulling the plug and doing the Data.List breaking change was first having patches (or merged changed) in all Stackage libraries bringing them into compliance. I think that is a very fine hurdle for breaking changes to clear, and if this proposal makes good infrastructure for that such as I just mentioned, it won't even be a hard hurdle to clear.

Ericson2314 commented 2 years ago

@Bodigrim left a very nice description in https://github.com/ghc-proposals/ghc-proposals/pull/287#issuecomment-1120173984 how the Simplified Subsumption regression test was botched. That's a bummer, but also makes crystal clear how this proposal can help.

When the way to do such a test is well documented/automated, and the baseline control is well-maintained (so unrelated failures don't ruin the results), doing such experiments will be much easier and more reliable.

chreekat commented 2 years ago

Hi! Since I'm the "HF resource" (devops engineer) who might work on this, I thought I'd check on status. I don't yet have a personal opinion one way or the other yet since I'm not really plugged in, but if everybody else has a consensus, maybe we could finish crossing the t's and dotting the i's.

simonpj commented 2 years ago

Hi Bryan, thank you.

I suggest:

  1. You review the proposal, with Ben, in the light of whatever feedback we have gotten so far -- and of course your own views.
  2. Explicitly consult, via direct email, with appropriate folk from

    • stack,
    • cabal,
    • hackage
    • core libraries committee

    to check that they are on board.

  3. Revise and polish in the light of this feedback.
  4. Broadcast a call to library authors to invite their feedback. "Here's a plan, we'd like to consult you". In principle they could be contributing now, but there are thousands of library authors, and few of them will be following this repo.

    In this phase I'd suggest also writing personally to the maintainers of a few dozen key libraries (i.e. ones with many dependencies).

Actually doing all this takes a bit of bandwidth, which we are always short of, but which your presence will help with a lot.

It's as much about building consensus as about techincal content!

All this is just my suggestion... the rest of the Stability Working Group may have other views.

bgamari commented 2 years ago

Indeed, this is a project which we hope to have you work on, @chreekat.

ndmitchell commented 2 years ago

I have attempted to use head.hackage in the past, and it didn't work for me. The specific problem I encountered was that a package I depended on had a policy of not releasing to real Hackage, only head.hackage, until a stable GHC was released. That package required a changed API to be compatible with the new GHC, so I was left with the choices:

1) Land the patch in the GitHub repo, and break my ability to make releases if anything else comes up. 2) Don't land the patch in the GitHub repo and make releases to head.hackage instead. 3) Use branches, cherry pick etc. adding lots of cost to my development workflow.

All of those options were unpleasant. The specific package in question was very low down in the tree. I appreciate that head.hackage can parallelise updating dependencies, but uploading to Hackage must be serial. I think it is very important that head.hackage doesn't cause people to delay on uploading to Hackage, which it most definitely has in the past.

chshersh commented 2 years ago

As much as I want to reduce the churn around upgrading to a newer GHC, I'm afraid this proposal is a step in the wrong direction. It will waste lots of time by many people, potentially increase the fragmentation of the already fragmented Haskell community, and increase frustration in some areas (even if reducing it in others). I'm going to elaborate in detail on why I think so.

Let's start with flaws in the described workflow:

  1. The maintainer of C must wake up, fix their package, and make a Hackage release. The maintainers of A and B can do nothing at this point.

This process is terrible in lots of ways:

  • It is utterly serial. The maintainer of B cannot lift a finger until the maintainer of C has not only fixed package C, but also uploaded a new release to Hackage.

This is not true. Maintainers A and B can submit fixes to package C instead of doing nothing. Both cabal-install and stack allow depending on specific commits of packages, not only on Hackage versions. So people can contribute patches directly to packages and test those patches in their own packages without waiting for the Hackage release.

Each step has multiple serial parts: often the maintainer will merge a patch (perhaps in response to prompting) but not do a release, blocking further progress.

We can help maintainers by simplifying Hackage upload workflow or providing necessary CI integrations (e.g. in form of documented CI workflows with examples). This is immediately helpful to everyone and will improve the situation straightaway without paying the high cost of this proposal.

If a maintainer is unavailable for any reason, the entire dependency tree of that package is blocked. In an ecosystem with hundreds of widely used packages, the chances of every single maintainter being available in a timely fashion are close to zero.

The GHC.X.hackage proposal doesn't solve this problem. First of all, this is not a problem in Haskell at the moment since everyone can contribute patches upstream and packages can depend on specific commits. Secondly, if by saying that "the entire dependency tree of that package is blocked" the proposal authors mean "fixes are not on Hackage", the GHC.X.hackage proposal won't help here because it's, well, not Hackage either.

It is not clear to even a willing and available maintainer when they need to wake up and do some work. Often it is up to a motivated individual (say the maintainer of an application that uses A) to sequentially bug the maintainers of C and B and A in sequence so they know that they are able to do something.

Having a more predictable GHC release schedule helps with that. If maintainers know that they need to wake up e.g. every February 18 to update to a newer GHC, they can sleep in peace for the rest of the year. Another proposal about tick-tock release cycle for GHC helps here better and moves in the right direction.


So to me, it looks like the only feature GHC.X.hackage allows is the ability to write a single line in the package configuration — an alias of a bunch of patches — instead of enumerating all patches one by one. This is a nice benefit but I don't think it's worth the cost.

  1. GHC.X.hackage simply creates double work. In addition to creating a patch directly to the used package, a contributor also needs to open a patch to GHC.X.hackage (if they're using GHC.X.hackage). Not to mention, that GHC.X.hackage maintainers need to spend time reviewing those patches in addition to the original maintainer reviewing the patch.
  2. GHC.X.hackage widens the fault horizon. What will happen is that maintainers will always use GHC.X.hackage and we'll effectively have two Hackages. And everyone must look in two places to understand what's going on when things go wrong. And they will go wrong. There's no way to guarantee that the patch proposed to GHC.X.hackage will be the same patch merged to the original package. And when this will happen, it'll create lots of frustrations.

The main explanation of why the Haskell community has delayed GHC support (besides having breaking changes in GHC) is the fact that widely-used packages are maintained by single volunteers. And volunteers don't have to do what you want them to do, they'll do what they want and whenever they want.

If a maintainer of a widely-used library disappeared from the Haskell community forever, then GHC.X.hackage made its way to perpetuity by having this patch and requiring everyone to always use GHC.X.hackage. Otherwise, GHC.X.hackage just asked lots of people to do lots of redundant work.


In conclusion, I feel that this proposal tries to solve the problem of having scarce volunteering resources by requiring volunteers to do even more which obviously doesn't work.

tfausak commented 2 years ago

I was in the midst of writing my own comment, but @chshersh said everything I was going to say. (And said it better!)

For what it's worth, I've never used head.hackage. When upgrading GHC, I've used the Git source feature of both Cabal (source-repository-package) and Stack (extra-deps) to work around problems.

parsonsmatt commented 2 years ago

(I have not read the proposal itself yet, so my thoughts here are entirely untainted with any idea of what we're talking about)

Having just done a bit of work developing compatibility with GHC 9.4 for the yesod ecosystem, I actually found the process to be quite smooth. I started with persistent, since I maintain that library, and I used ghcup to acquire the 9.4 prerelease and begin compiling it. I setup a cabal.project and set --allow-newer, and traced the build failures. When a package failed to build, I forked it, fixed it, made a PR, and then referred to my fork using cabal.project's source-repository-package feature.

It took about two days of labor to get yesod's tests passing with GHC 9.4.

The primary difficulty, IMO, is updating the strict version upper bounds. I wonder if there's a way to automate that process - ie, if cabal test --allow-newer completes successfully with a base that is above the bounds, we can bump the bounds. Or if there's a way to "flag" these packages in a semi-automated fashion.

After that, actually implementing the changes needed in base and ghc-prim were relatively straightforward. The pre-release notes and Migration Guide answered all of my questions.

The main advantage that a GHC.X.hackage would have is that it would share this work. I have a PR to yesod that lists a cabal.project that makes it work with GHC 9.4, which you'd copy/paste into your own project, along with depending on that commit of yesod, and then you can test your app. But presumably many such projects have happened or will happen - not just for yseod, but for servant or snap or beam etc. And then every one of those will run into the same build issues with cereal, foundation, vector, etc that need patches. And - unless they see my PRs ot those repos and save themselves the work - may end up duplicating it.

So, if GHC.X.hackage allows us to collaboratively provide these patches - then that's great! But if I'm dependent on an absent maintainer (ie foundation) to upload to this Hackage overlay, then we're in a bad spot. But why should I (some rando) get to upload to a Hackage overlay, for some package I'm just doing a drive-by contribution for?

Makes me think that the 'proper' fix is to have an index of Haskell packages that are known-to-not-build (either via constraints, base < 4.17 or via attempted builds that fail), linked to a set of potential fixes (that can possibly be community submitted, without guarantee of correctness/completeness - maybe even just links to GitHub PRs or commits). So when I go to build shakespeare, it says "The Hackage version is known-to-not-build with GHC 9.4. There's a git commit available here. Want to add this to your cabal.project?"

hasufell commented 2 years ago

I think there's a plethora of issues:

  1. having clear visibility of upcoming API breakages, build failures and automatically notifying hackage package maintainers about those (can we detect this automatically? Can this be done semi-automatic as part of GHC release process? How do we notify?)
  2. having a central place for communication/coordination of migrations. Currently the closest we have to this is stackage issue tracker, e.g. wrt aeson-2: https://github.com/commercialhaskell/stackage/issues/6217
  3. an easy way for people to start their work based on the currently available patchset

I think the main issue is figuring out build failures and having clear public visibility about those. This is something that should be part of hackage itself. And even point 2 could be argued to be part of hackage.

This proposal somewhat fixes point 3, but I think that's not the biggest issue of all. head.hackage could simply just be a cabal.project.local pointing to all the upstream PRs instead of another hackage repo. The reason it's a hackage repo is probably because for GHC development it might be unrealistic or unfeasible to create upstream PRs for every single patch. But that is exactly what we want.

parsonsmatt commented 2 years ago

OK, reading the proposal and learning what head.hackage even is as a 'Hackage Overlay',

The source of GHC.X.hackage is held in a Git repository, and accepts patches from community members who need not be the package maintainer. For example, the maintainer of package A might submit patches to fix B and C. This will use the same infrastructure as the existing head.hackage repository: here is an example of a PR for head.hackage.

The workflow is a bit clunky. I'd really prefer to just say "Here's a GitHub pull request, please make the relevant patch stuff for me." I'm already going to be making a PR upstream with my fixes, might as well share the work here.

One problem with integrating this into my workflow is that many packages I maintain are part of multi-package repositories. So a change to persistent means I need to actually, like, figure out how to use git in some way and cherry-pick the changes over, for each package that needs to be changed. Implementing a 1:many repo:package relationship support would make this trivially easy: eg, git clone my sweet persistent fork, checkout a branch, and generate relevant patch files for the relevant packages that need to change.

The inclusion criteria for a patch are:

  1. The patch should represent a patch-version level change according to the PVP, in particular it should not change the package’s API.

Hm. Some packages re-export things from ghc-prim or base in a way that make this impossible. Or, the change to base etc require a breaking change or addition to the API. What's the expected solution there?

For the cabal.project and extra-source-repositories approach, it's fine - I just depend on what I need and fix what needs fixing, making patches as-we-go. A new minor/major version bump as part of upgrading to a new GHC is somewhat expected, IMO.

(ah, this is covered slightly further down in the doc)

6.3 Precognitive releases

This whole sections seems really dicey. I think the discussion is mostly correct - the system is too error-prone to be worth any investment of energy. I'm happy having a PR up that links to dependent PRs that require release before a Hackage version can be made, and providing a cabal.project or stack.yaml that can be used in the interim

But perhaps B requires no updates at all to work with GHC X (this is a common case). Then this message would be over-conservative. Maybe Hackage could proactively set the Tested-With field, by building the package and running its test suite? Or maybe we need two fields: a manual one and an automatic one.

I've repeated over-and-over again that we have a problem with upper bounds, and that is that we conflate "This is known-not-to-work" and "This is not-known-to-work."

For a bound "This is not-known-to-work," we can always just test it and see if it works when the relevant version comes out. Then we either update it to "This is known-to-work" or "This is known-to-not-work."

If Hackage had some support for automagically bumping upper bounds on GHC boot packages (eg base, ghc-prim, etc) that would be pretty sweet. It seems much more intractable if we're trying to globalize that, though, since you'd need a huge matrix...

The presence of allow-newer has made this significantly easier, at least.

hmmmmm 🤔

My overall impression is that this is a lot more work than my current workflow. It'd be nice to share some of the work (I'm sure I'm not the only one to patch cereal, even if I'm the only one to have made a PR), but if folks aren't willing to make PRs upsream with their patches, they're probably also not willing to make a PR to a Hackage overlay.

Really, I just want to be able to share extra-source-repository stanzas, such that it's trivial for me to contribute one and it's trivial for me to find them.

tfausak commented 2 years ago

Really, I just want to be able to share extra-source-repository stanzas, such that it's trivial for me to contribute one and it's trivial for me to find them.

With Stack, you can create and share a custom snapshot that includes a bunch of extra-deps using Git sources. In fact we use this custom snapshot approach to have a unified set of dependencies among internal packages. (Our snapshot is public, but you probably don't want to depend on it.) The community or the Haskell Foundation or GHC developers could provide a snapshot to rally around when upgrading GHC.

With Cabal, I don't think it's quite as nice. As @hasufell mentioned, you can cram all the same stuff in a cabal.project.local. But I don't think there's a good way to use such a shared config from Cabal directly. People would have to download the file and include it in their project. Not the end of the world, but not as seamless as throwing resolver: some-url in your stack.yaml file. (Also I think Cabal preemptively downloads all Git sources, whereas Stack only grabs them when needed.)

For Cabal developers, head.hackage (and this proposal) provide a developer experience that's similar to Stack's. You just add a new repository to your Cabal configuration and you're good to go. That makes me wonder: Is it possible to make an ad hoc Hackage overlay from a cabal.project file? That would allow someone to produce a cabal.project using source-repository-packages for, say, GHC 9.4; and then share that configuration as an overlay.

I'm just thinking out loud here, so feel free to ignore all this. I mainly wanted to point out that you can already get something like GHC.X.hackage using a custom snapshot with Stack.

hasufell commented 2 years ago

But I don't think there's a good way to use such a shared config from Cabal directly. People would have to download the file and include it in their project.

No, they won't. The next cabal release will support include directives for cabal.project including fetching them from remotes.

These may also include resolver constraints, so it's much more ergonomic than using stack for this.

tfausak commented 2 years ago

The next cabal release will support include directives for cabal.project including fetching them from remotes.

Great! That was news to me, so I hunted down the pull request: https://github.com/haskell/cabal/pull/7783

brandonchinn178 commented 2 years ago

This is not true. Maintainers A and B can submit fixes to package C instead of doing nothing. Both cabal-install and stack allow depending on specific commits of packages, not only on Hackage versions. So people can contribute patches directly to packages and test those patches in their own packages without waiting for the Hackage release.

I want to echo this. Upgrading my company's codebase to GHC 9 was a not-difficult process, as I can just make forks to upstream repos and reference the new commits in extra-deps.

I'd rather see updates to this issue: https://github.com/haskell/cabal/issues/7821

gbaz commented 2 years ago

It sounds like the main concern is duplicate work. Perhaps the process for "constructing" the head overlay can make use of people submitting pointers to patches automatically, so as to reduce duplication here?

hasufell commented 2 years ago

It sounds like the main concern is duplicate work.

I'm not sure it is. As I pointed out I'm not convinced this solves the core issue, which is visibility of required work, overview of current patches and a communication platform.

head.hackage seems to be somewhat specific to GHC workflow.

What makes us think it will help engage less active maintainers? It only talks about the patching workflow.

gbaz commented 2 years ago

I want to suggest that a mechanism like this is better than not -- including in terms of an overview of current patches, and visibility of the work.

In particular, if these patches are generated from a git repo, then one gets an overview of all of them and a visibility of them. Further, github tickets on the repo are a reasonable communications platform.

The alternative workflow (what works now) that people are suggesting seems to be that everyone just builds their own list of downstream patches and prs to the things they depend on, ad-hoc. If one collects a centralized collection of such things in a consumable way, then you arrive at, in essence a hackage overlay. So the strict improvement to what exists now seems to lead directly to something like head.hackage (although building it from existing prs when possible rather than forcing duplicate commits of work seems like a very important concern!).

That said, I do find the idea of using a remote cabal.project file rather than a hackage overlay index an interesting technical question. When head.hackage was first developed, we didn't have the tools to do this (including, i believe, git deps) -- and we only fully do now in the upcoming version of cabal, when includes (including remote) in cabal.projects will be available.

Because these features are so new, I really don' t know the ergonomics on either the producer or consumer side of using that approach, but I think it is well worth exploring in terms of this proposal.

hasufell commented 2 years ago

I want to suggest that a mechanism like this is better than not -- including in terms of an overview of current patches, and visibility of the work.

But we're starting the discussion about patch workflow, while there's not even a clear generated list of broken reverse dependencies.

This is where e.g. linux distributions start when upgrading toolchain:

  1. CI builds reverse deps and collects failures (public visibility)
  2. a meta-ticket is posted on the issue tracker about the migration (communication platform)
  3. individual tickets to other packages (and possible patches) are added as dependencies of the meta-ticket (patch workflow)
  4. maintainers are CCed (notification)

This proposal doesn't approach the problem in a structured way, imo.

gbaz commented 2 years ago

Ah now I see your point -- having a centralized point for patches is nice, but it would be much more useful if accompanied by some centralized forcing function that recorded results of builds of some universe of packages and their blockers. I'm now leaning towards thinking the proposal should discuss how we could get there, even if its somewhat of a stretch goal vs what now exists -- hf proposals can and should be a bit "stretchy" because they're not just about what we can say would be nice with the resources we have, but also what we would aspire to do with more resources -- which in turn can be used by the hf board as a way to reach out and get those resources through corporate donations, etc!

simonpj commented 2 years ago

having a centralized point for patches is nice, but it would be much more useful if accompanied by some centralized forcing function that recorded results of builds of some universe of packages and their blockers

That sounds like a sensible thing to do. After all, we can hardly engage the community in generating patches that apply simple fixes to packages, if the community does not have a "work-list" to work from.

But would it not also make sense to have a places to collect such patches, and make them systematically available to others? That's what this proposal suggests, as I understand it.

I think that everyone on this thread is looking for constructive suggestions here. If this proposal isn't going to help, what is? (Beyond "make no breaking changes, ever", which is a perfectly defensible position, but one that I would deeply regret.) Specific, concrete suggestions would help us all.

simonpj commented 2 years ago

In conclusion, I feel that this proposal tries to solve the problem of having scarce volunteering resources by requiring volunteers to do even more which obviously doesn't work.

Put like that, it sounds very reasonable. And perhaps you are right. But there is another way to look at it. Perhaps there are lots of volunteers who would be willing to help update packages if only they had a way to help. Centralising a way for people to work together to generate patch sets across hundreds of packages might unlock that suppressed potential. And it might make less work for the package authors too, because they just have to merge an already-tested PR, rather than generate it.

I'm not a package author, so I don't know all the dynamics here. But I'm keen to find ways to relieve the stress on hard-pressed package authors. Perhaps there are other ways to do that?

hasufell commented 2 years ago

I think that everyone on this thread is looking for constructive suggestions here. If this proposal isn't going to help, what is?

Well, I think I sketched the required action points broadly in my previous comment.

Additionally, the stackage team has been somewhat doing exactly what we're discussing here: building a set of packages in CI, collecting failures, communicating required work and existing patches.

What we would need is simply formalizing and automating this process better, so that both package authors and potential volunteers know what's up and what to do. The details may or may not involve GHC.X.hackage, but I'd argue we need to look at this from a holistic workflow POV instead of pushing isolated solutions.

michaelpj commented 2 years ago

I see a lot of comments along the lines of "this seems like it will be annoyingly extra work". I think this is a really key point, but it's something we can alleviate with tooling. We've been working on a tool for making cabal package repositories from source specifications called foliage that I think makes this significantly easier (although maybe still too much work!).

You can see an example of a foliage source repository here, here's the specification of a package version (note the support for the subdir field @parsonsmatt !). So the content you need to PR is basically just a little metadata file with a URL for the source tarball, which you can easily get for, say, an arbitrary Github PR.

The repository itself is just built with Github Actions and lives in the gh-pages branch.

Really, I just want to be able to share extra-source-repository stanzas, such that it's trivial for me to contribute one and it's trivial for me to find them.

Note that a source-repository-package stanza can only provide a patched version for a single version of the package. If we need to fix both version X and Y we are in trouble. Which is admittedly going to be a rarer case (but we might well want to patch, say, a version of both aeson-1 and aeson-2).

I think the same comment applies to the discussion of using extra stack resolvers, but perhaps it's less of a problem there since stackage tends to work with a single version of packages anyway.

Is it possible to make an ad hoc Hackage overlay from a cabal.project file? That would allow someone to produce a cabal.project using source-repository-packages for, say, GHC 9.4; and then share that configuration as an overlay.

This is not quite what foliage is, but it aims to be approximately as much work (in that it's basically just a big list of URLs).

GHC.X.hackage simply creates double work. In addition to creating a patch directly to the used package, a contributor also needs to open a patch to GHC.X.hackage

In the current workflow, the contributor needs to:

  1. Create a patch to the package
  2. Pin the patched package with a source-repository-package stanza

I think if we could make contributing to GHC.X.hackage nearly as cheap as 2, and reduce the overall frequency of this process due to sharing, then it could be a big win overall.

(How could we make contributing that cheap? Imagine we had a standard issue structure for "package version foo-X doesn't build", and you could comment on that with something like "fixed in $PR _LINK" and a bot would make a PR adding that to the repository. Work, yes, but pretty feasible work.)


Another big question is: who is this for?

This is not true. Maintainers A and B can submit fixes to package C instead of doing nothing. Both cabal-install and stack allow depending on specific commits of packages, not only on Hackage versions. So people can contribute patches directly to packages and test those patches in their own packages without waiting for the Hackage release.

When a package failed to build, I forked it, fixed it, made a PR, and then referred to my fork using cabal.project's source-repository-package feature.

The main advantage that a GHC.X.hackage would have is that it would share this work.

I think @chshersh and @parsonsmatt describe how the current situation is not too bad for a maintainer who:

If any of these are not true, then GHC.X.hackage becomes much more appealing, because:

However... I think this picture undermines the motivation somewhat. If we think that GHC adoption blockage mainly happens due to packages low down in the dependency tree; well, those packages likely have maintainers who are experienced and competent, and they probably don't have a big dependency footprint. So maybe GHC.head.hackage would not be an improvement over the status quo for these people, and we don't need to worry so much about the others.

A motivating example for me here was HLS. We used head.hackage to help prepare GHC 9.2 support, and it was very useful. But HLS is unusual: it's increasingly a "core" package (lots of people care if it works!), but it has a big dependency footprint and lots of fragmented maintainership. So HLS I think would benefit from GHC.X.hackage, but perhaps it's an exception.


Misc

I think it is very important that head.hackage doesn't cause people to delay on uploading to Hackage, which it most definitely has in the past.

GHC.X.hackage widens the fault horizon. What will happen is that maintainers will always use GHC.X.hackage and we'll effectively have two Hackages.

I think this is a real risk, and we should take steps to avoid it. My proposal would be simple: aggressively retire GHC.X.hackage a fixed time after GHC X is released. And by retire I mean "take down the URL, anyone still using it is broken". Using GHC.X.hackage can be helpful while still obviously something you must not rely on.

But why should I (some rando) get to upload to a Hackage overlay, for some package I'm just doing a drive-by contribution for?

Several people have suggested collecting source-repository-package stanzas and sharing them, presumably contributed by "some rando". We already cargo-cult them from each other, with little vetting (at least if you're me :sweat_smile:). GHC.X.hackage is just a big collection of patches, it's not trusted releases by the maintainer.

I do think there's a messaging risk here. I wonder if we can alleviate this with naming. GHC.X.unofficial.hackage? GHC.X.unsafe.hackage?

And volunteers don't have to do what you want them to do, they'll do what they want and whenever they want.

Volunteers also often do a lot more than we expect them to. Despite being mostly only used by GHC devs, head.hackage has lots of patches. I don't see a reason to think that GHC.X.hackage would get less engagement. And if nobody wants to contribute... well, that's a shame and we can give it up as a failure.

That sounds like a sensible thing to do. After all, we can hardly engage the community in generating patches that apply simple fixes to packages, if the community does not have a "work-list" to work from.

An obvious place is: the issue tracker of the git repository that GHC.X.hackage is built from!

brandonchinn178 commented 2 years ago

One additional point I just realized is that not all packages are on GitHub (some packages don't even have a repo link! I had the misfortune of running into such a package recently, and I could only use Hackage's Source links to view the source), so having a centralized platform would make a consistent interface to do so.

But the main point I'm still hung up on is the idea that this proposal would change the "normal" workflow. Right now, on my open source repos, I have a workflow (that I like) where contributors should fork the repo, make a PR, I review it, and merge it. With this proposal, I would also have to somehow be notified when GHC.X.hackage has a patch for my project (there's already an existing issue of maintainers not being contactable via their Hackage info) and copy/paste the patch into my git repo. Then there's two sources of truth for the patch: GHC.X.hackage and the source repo.

To reiterate, IMO, this proposal solves the wrong problem. The main bottleneck I experienced is maintainers not merging PRs quickly enough into the canonical repo (indeed, I still have a PR in the monad-validate repo for supporting GHC 9 that hasnt been merged). This proposal doesn't solve that problem; it just provides another mechanism for specifying a patch (which we already have, for GitHub repos). Perhaps fortunately, all the libraries we needed to touch were either on GitHub or GitLab, so maybe that's where my lack of motivation for this proposal is coming from.

Where this proposal really shines (and what it should lay out as the primary motivation) is being a standard interface for providing patches. The current primary motivation specified of things being too serial and whatnot are not issues / already solved (with the caveat of the cabal issue I linked previously). But this proposal should also lay out how it will integrate with maintainers' existing workflows.

One last point I have is looking at other language ecosystems. Rust and Python don't have any notion of an overlay; if packages break on new language versions, you have to fork upstream and patch, which is why you try to choose packages that are actively maintained (which should also be your logic when choosing a Haskell dep). NodeJS doesnt have a notion of an overlay either, but with the recent rollout of Yarn 2, Yarn did provide a way to overlay in the equivalent of stack.yaml/cabal.project (docs), and it also provided a global overlay here, so that people didnt have to copy the same overlay in each project. So there is a bit of precedence here, albeit Yarn only dealt with package metadata, not overlaying actual source code.

parsonsmatt commented 2 years ago

The mention of Stackage made me wonder: how hard would it be to get something like Stackage for GHC HEAD? or even a prerelease stackage? RIght now Stackage Nightly kinda serves that purpose, but it often lags far behind GHC HEAD, often by major versions. A Stackage HEAD could serve this purpose quite nicely.

bergmark commented 2 years ago

The mention of Stackage made me wonder: how hard would it be to get something like Stackage for GHC HEAD? or even a prerelease stackage? RIght now Stackage Nightly kinda serves that purpose, but it often lags far behind GHC HEAD, often by major versions. A Stackage HEAD could serve this purpose quite nicely.

I was attempting to create a snapshot for the 9.4 alpha (https://github.com/bergmark/stackage/commit/c0831498d6071b8082ba070c5084897c771d0fea) but I got busy with other things. I think there are a few things missing:

gbaz commented 2 years ago

But the main point I'm still hung up on is the idea that this proposal would change the "normal" workflow. Right now, on my open source repos, I have a workflow (that I like) where contributors should fork the repo, make a PR, I review it, and merge it. With this proposal, I would also have to somehow be notified when GHC.X.hackage has a patch for my project (there's already an existing issue of maintainers not being contactable via their Hackage info) and copy/paste the patch into my git repo.

I don't think this is the case at all, in any version of this proposal. Which is to say that head.hackage.haskell.org already exists, and has for a long time! It is expected that it is the responsibility of patch authors to ensure that they also upstream their patches (as it is with debian distro-patches to packages, etc). And the foliage workflow suggested above seems like it would improve that situation, by making, when possible, prs to the existing repo the canonical source for as many ghc.x.hackage patches as possible.

That said, it is well known that the existing system has a clock and gets rolled over and replaced at a certain point, so it does not turn into an ongoing "source of truth".

Where I can see some concerns is that If this proposal tends towards making ghc.x.hackage something used for a "long time" rather than just as an interim measure, people might start to treat it more as an authoritative source rather than a stopgap. I think that such repos should only be used for aiding in ecosystem migrations, and for helping package authors move quickly towards such (and as they are now, for ensuring GHC head has a sufficient universe to stress-test against).

As such, expanding the usage to cover released as well as pre-release ghc seems fine -- but only to a point! In particular, while we could keep these overlays around "forever" and we currently keep them around insufficiently long, I would prefer that we still had some "clock" (1 year? 2 year?) on these repos as a forcing function to make sure that people only put out supported released stuff as pinned against central package repos, and not these overlays.

gbaz commented 2 years ago

based on our hftt discussion, I think the sentiment was that this proposal doesn't have to extend scope to cover CI or etc, even though those are great additional ideas for further proposals (and if the authors want to extend scope here somewhat, we'll have no objections!). However, it should be rewritten to focus on the core thing it solves -- sharing work in letting authors update and test packages against newer ghcs (and also giving a gauge as to breakage induced on ghc head), and also to concretely explain the simple workflow for this it enables (building repos out of lists of git hashes or the like).

My concern about "alternate sources of truth" was not widely shared -- rather, people motivated why having these ghc.x.hackage things be long lived was useful for git bisect, ci, etc. That said, I think the proposal still needs to address this as a concern, and motivate why the issue is not that bad (i.e. that the social factor of releasing libraries to hackage as a centralizer will ensure that pinning against odd hackage overlays happens at the leaves of proprietary app code, not on the trunk or branches of the open-source lib ecosystem).

maurobalbi commented 2 years ago

@gbaz As I understand it, the proposal will allow maintainers to work on their packages in parallel, the release to hackage will still be sequential (since packages with overlay-dependencies are not allowed in Hackage)

Ericson2314 commented 2 years ago

@maurobalbi That is not quite the case. The reasoning is sort of tricky, but if you don't need to bump your bound, you can always push a new version. e.g. if you are working on library A and have a ^>= 1.1 dep on library B, and the overlay proves a PVP-compatible variant of B-1.1 is possible, you can go ahead and upload your new version of A.

The idea is that if a new version of B is uploaded that is validly within ^>= 1.1, then your A will in fact work with it, and if it supports the newer GHC, and the combination of it and your new A will also support the newer GHC.

(The larger story here is that the combination of versioning/what-constitutes-breakage policy and the fact that version bounds include yet-to-be-released packages makes version bounds much more structured than people thing, and gives us theorems like this.)

maurobalbi commented 2 years ago

@Ericson2314 Thank you for the clarification.

Since GHC.X.hackage is opt-in, users who do not use GHC.X.hackage will not be able to use the packages which have been fixed in GHC.X.hackage, until fixed versions of those package and all their dependencies are actually released on Hackage; however, we hope that by allowing Haskell packages to be made compatible with new compiler releases more quickly, we can help speed this process.

I took this part of the proposal and interpreted it the wrong way.

So in effect the proposal allows maintainers to patch "slow-movers" (without duplicating efforts) and join back into the fold (Hackage)? That would keep the risk of fragmentation pretty low then. I think that's a nice solution!

Ericson2314 commented 2 years ago

@maurobalbi well to flip it around, the fragmentation risk is that the new version of B never gets uploaded, so while the new A doesn't violate the letter of the law, it does violate the spirit in that people may be using overlays for abandoned packages indefinitely.

I think this OK.

If I recall correctly, we already have some informal ad-hoc procedures for finding new maintainers for abandoned packages. That is good. (I wouldn't want to formalize that process, which is inherently one of tip toeing and special circumstances.) With GHC.X.Hackage, we have an "outlet" for not being blocked before any abandonment becomes acute enough to warrant such a drastic move. And if the time does come to give a package a new maintainer so we can stop using the overlay, there is no better credential for would-be new maintainer than having maintained overlays for that package in the past.

Overall, GHC.X.Hackage is to be about collaborating more, and being blocked less, whereas Hackage has really top-knotch trustees that blur these things but in the original conception is more of a individualist strict package "property rights" regime. I think the combination of all the above will be very fruitful and allow great community bonding with minimal hurt feelings or other drama,

hasufell commented 2 years ago

based on our hftt discussion, I think the sentiment was that this proposal doesn't have to extend scope to cover CI or etc, even though those are great additional ideas for further proposals (and if the authors want to extend scope here somewhat, we'll have no objections!)

How are you going to engage unpaid contributors if they don't even know what is required to do?

If the workflow is not really smooth (and again: submitting patches is not even half of it), then I don't see this taking off beyond GHC core devs and other highly engaged individuals.

Additionally, it'll make it harder for future proposals.

gbaz commented 2 years ago

I think the sentiment is that this is a proposal for the start of something, not for the whole of it, and even as is, it is an incremental improvement. But again -- the proposers are very welcome to expand scope, or have some goals as immediate and some sketched as future work. We're just not going to tell them they "must" do any such thing or we'll say no -- at least that's what people are leaning towards. The old adage about the perfect not being the enemy of the good and all that.

david-christiansen commented 2 years ago

Additionally, it'll make it harder for future proposals.

Can you explain a bit more of what you mean here? Which proposals do you see getting more difficult if something like this were implemented? Are there updates to this proposal that could mitigate this risk?

hasufell commented 2 years ago

My idea would be to create a proposal which describes the required process and interactions, not the required tools.

The tooling can then be an implementation of said proposal (possibly a separate proposal or less formal).

If a proposal with a specific technical solution is accepted, anything that comes after that will be kinda bound/limited to those decisions.

This leads to step by step implementation decisions that lack coherence/vision.

I don't think anyone here proposes this to be perfect, but things that concern public workflow of unpaid contributors should look at the holistic picture and explain why this will engage contributors.

david-christiansen commented 2 years ago

Thanks, @hasufell, your position makes sense to me now. I misunderstood the previous comment quite severely!

david-christiansen commented 2 years ago

Here's my summary of the state of the discussion.

Problem Statement

The lack of backwards compatibility in GHC and base releases means that we end up in situations where we would like to use a new compiler version, but cannot because the code that our code depends on is not compatible with the new version.

In particular, some library L might depend on libraries B and C, which themselves depend on library A.

     _ B _
    /      \
L -<        >- A
    \      /
     - C -

In order to make L verifiably work on a new release, A, B, and C must all first be updated, a process that involves their maintainers. These maintainers are typically unpaid volunteers, who will review patches when they please. Most libraries are developed using Git, and most of these are on GitHub. Today, A's maintainer must both update A and perform a release before B and C can proceed, and B and C must both be updated before L can be updated.

Today, many developers work around this by using cabal-install or stack's ability to override the package index for specific versions, pointing them at specific commits in which the compatibility problems have been resolved. With this approach, a temporary fork or a pull request can be used as the source of a package, allowing projects further up the dependency chain to start porting before a release has been made. Each developer must figure these overrides out for themselves.

Proposed Solution

Today, head.hackage is a repository of patches against various Haskell packages that allow them to work with the ongoing development version of GHC. It is primarily used by the GHC developers. These patches are used to generate a Hackage overlay, which is a supplementary repository of packages that is used together with Hackage when working with GHC HEAD.

The proposal is to extend this mechanism to add overlays for specific released versions of GHC, as a stop-gap during the porting effort. Instead of each developer needing to configure their build tool to point at specific releases, they could point at GHC.X.Hackage to get patched versions of their dependencies.

Critiques

More work for maintainers

Maintainers have objected that this overlay process would increase their workload, rather than decrease it. The fear is that they are expected to both fix the library and send a patch to GHC.X.Hackage. I'm not sure whether this is the intent of the proposal - from what I can see, the intention was that non-maintainers could contribute to GHC.X.Hackage, while maintainers could do a release on Hackage instead. The idea is to provide a way for B's maintainer to update A, and then not require C's maintainer to repeate the process.

Technical concerns

Multiple sources of truth

There is a concern that if GHC.X.Hackage persists long after the release of GHC X, then we end up in a situation where the ecosystem fragments. Some have suggested getting rid of GHC.X.Hackage some fixed amount of time after GHC X is released, but this would also cause tools like git bisect to break. I think it would be useful to put in more thought on how to incentivize GHC.X.Hackage being a short-term stop-gap.

Scope and problem

There has been a critique that this proposal will not create a common work-list of tasks to be done to update the world. The analogy is to a Linux distribution, where updates trigger massive rebuilds with failures reported, so that volunteers have an idea of what works and what doesn't.

Hackage has a few constraints that make this less feasible:

Even if we don't have a central source of knowledge, we do still have a distributed form of knowledge that results from individuals trying to update their code and getting stuck on dependencies. This would also lead to popular libraries being updated more quickly than unpopular ones, which seems useful to me.

It seems to me that this critique is primarily a request for an additional proposal, rather than a reason not to do this particular thing. Is that fair?


Is this a fair summary of the discussion so far?

Ericson2314 commented 2 years ago

@david-christiansen I would extend the problem statement to also mention that head.hackage is underutilized because it is only useful for GHC developers and not the community at large. Without trying to predestine a head.hackage-inspired solution, we can still say the situation today is thus "balkanized", and this is a tragedy that leaves both GHC HEAD and newly-released GHCs less battle-tested than they might be otherwise.

Ericson2314 commented 2 years ago

Accordingly, this is why I am not really worried about what technical form the initial GHC.X.Hackage takes.

To the extent it is hard to contribute to, regular users will use it without contributing back, but that is fine! Merely having more consumers of the thing will inspire a few people that don't mind the technical hurdles to work on it more anyways --- we can also argue it's firstly GHC team's responsibility to ensure GHC.X.Hackage is good enough so that upon release there is proof to the community that the new GHC is usuable / isn't too disruptive.

If later the GHC team feels overburdened keeping the thing up to date themselves, and the community is clamoring to help out, then we can refine the technical details to make it more convenient, but for the initial step I'm quite happy to just worry about the social problem of head.hackage only benefiting one group of people, GHC devs, not regular users.

hasufell commented 2 years ago

Accordingly, this is why I am not really worried about what technical form the initial GHC.X.Hackage takes.

Yes. I'm also leaning towards "just do it" without a proposal. A proper proposal that tries to put the pieces together can happen later.

chshersh commented 2 years ago

I personally wouldn't benefit from GHC.X.hackage explicitly (but I might implicitly, only time will show) as I won't be using this overlay by myself. Already existing workflow works well for me. I also like how I can easily see the libraries from my cabal.project that still have pending patches. I don't see an easy way to achieve such level of visibility with GHC.X.hackage.

However, this proposal being the HF Tech Proposal, if accepted, may actually make my life more difficult (I can speak only for myself) by:

I always have this feeling in Haskell that people are eager to dive into complicated tech solutions just to avoid dealing with humans. Yes, dealing with humans is not easy but the entire Haskell ecosystem is built by those humans. It's not sustainable for the entire community if you find it easier to maintain an entire overlay of patches and implement various strategies for retiring patches just to avoid asking maintainers "How could we help you?" or simply understanding that people might disappear.


I don't like derailing the conversations around the proposal by suggesting different solutions. But if the ultimate goal is to release new GHC versions faster, then a proposal like the below one helps to achieve the goal in a more community-friendly way:

If I'm on my 2-week vacation and someone has a burning need to support new GHC in my library ASAP, I'd rather let them merge their patches directly to my project and release it to Hackage without my involvement at all instead of contributing patches to some external overlay because I'm not available.


Addressing some specific details,

Centralising a way for people to work together to generate patch sets across hundreds of packages might unlock that suppressed potential. And it might make less work for the package authors too, because they just have to merge an already-tested PR, rather than generate it.

I feel like Stackage already does this and it's quite successful. I don't benefit from this process as it requires to use stack build tool (which is fair) and I'm using cabal-install for my personal projects. And I agree that having a centralised place to track all upstream changes would be a good thing to do and I indeed would benefit from this. However, this feels like a different proposal.

the current situation is not too bad for a maintainer who:

  • Is competent and adept at navigating the Haskell ecosystem (can find broken upstream packages, diagnose them, make and submit fixes, and thinks this is not a big deal)

This can be solved by having better and more visible documentation. Lots of Haskell developers simply don't know they can depend on specific commits of packages even though this feature is available for more than 6 years in the Haskell ecosystem!

  • Does not have too big a dependency footprint (so that fixing their broken upstream packages is feasible)

The assumption here is that the maintainer of a package with lots of dependencies will wake up first and went on a crusade to fix upstream packages. But this is not always the case. I personally wait for at least several months before I even try to upgrade to a newer GHC. Usually, existing GHC works fine for me and I don't have an urgency to upgrade to it.

  • Has the time to do the work (two days of labour for Matt!)

The same is true for GHC.X.hackage. Someone has to contribute a patch. It can be anyone who has free time and enough competency to do the patch. Not necessary the person who needs it. Describing a workflow for contributing patches upstream (e.g. in a form of a blog post but better official guides section) will help lots of people and will significantly decrease the threshold for being an active member of the Haskell community.

Yes. I'm also leaning towards "just do it" without a proposal

Having HF support for this proposal puts it into completely different perspective. For instance, one of the primary HF goals is to be the glue that connects the entire ecosystem. I don't feel this proposal supports this goal and, in fact, goes against it.

Without the proposal and having official HF support - go for it, you have my blessing 👼🏻

I'm not in a position to tell volunteers what to do in their free time. If some people want to implement GHC.X.hackage and others want to use it -- good for them 👍🏻 As always, people can do whatever they want in their free time. If it doesn't hurt me and makes life easier for someone else -- I'm happy for them and it doesn't bother me at all 👏🏻

Ericson2314 commented 2 years ago

@chshersh Well, let's start with the purpose of this proposal, which is is simple:

Foster a way for the community as a whole to collaborate on patched packages so GHC releases are maximally usable on day 1.

How this is accomplished is completely secondary. If the means are too controversial I would advocate splitting the proposal so we can first agree on the goal and then decide on the means. Would that address your concerns?

I always have this feeling in Haskell that people are eager to dive into complicated tech solutions just to avoid dealing with humans.

Following what I said above, this proposal should instead be at getting us to start dealing with other humans more so we don't just suffer in isolation redundantly creating the same patched package workarounds. It should be pro human interaction!