haskellfoundation / tech-proposals

The Haskell Foundation Tech Proposal Process

Other

69 stars 29 forks source link

A tick-tock release cycle for GHC #34

Closed bgamari closed 1 year ago

bgamari commented 2 years ago

Here we propose to formalize a rather long-standing pattern in GHC's release history, specifying a "bimodal" release cycle and giving concrete support windows for these releases. We hope that the this will provide better guidance to commercial users seeking to migrate between GHC versions and make it easier to plan the GHC release process.

Rendered view

hasufell commented 2 years ago

What if the LTS cycle would be a bit more dynamic and based on the subjective feeling of "backporting pain"?

E.g. if a new release includes major code refactors that will make backporting harder, this could be the start of announcing LTS support is going to drop after the minimum guaranteed support period.

parsonsmatt commented 2 years ago

I'm not sold on this.

For background and my remembered experiences (which are likely to be quite faulty):

I started learning/writing Haskell in ~2014. I'm pretty sure I used GHC 7.8.4 and it was great.

GHC 7.10 came out, and it was great. We got the Burning Bridges proposal, which was a pretty big change for the ecosystem to get used to. Meanwhile, we also had to let things settle out with the new version of the compiler.

GHC 8.0 had a bunch of changes, but I don't recall that the migration was particularly hard.

GHC 8.2 came out, and DerivingStrategies is the main thing I remember in the release notes. This was a solid release. I think it was relatively straightforward to get things working on this.

GHC 8.4 came out and included a Big Breaking Change: the Semigroup-Monoid proposal. This ended up being slightly less overall change than Burning Bridges, but there was only 197 days to get used to the compiler before the next major version came out. I don't recall spending much time on this version - too much stuff was broken for too long, and I ended up leap-frogging to GHC 8.6.

GHC 8.6 (197 days later) was fantastic. Solid, stable, excellent, lots of neat new features. QuantifiedConstraints, valid hole fits, DerivingVia, wow, lots of great stuff. Finally enough here to motivate a major change upgrade, and so it felt like the ecosystem migrated much more quickly than 8.4, which mostly seemed to introduce a breaking change.

GHC 8.8 came out 338 days later. I don't remember anything about this release, but it did seem like a ton of bugs kept popping up. Looking at the release notes, I think I might have migrated to GHC 8.8 for my work projects when 8.8.4 was released, but I may have just waited for GHC 8.10.2 which came out less than a month later.

GHC 8.10 is glorious. Been using it since 8.10.2 and we just switched to 9.2.2 at work this week.So, that's, what, August 2020 until May 2022? Almost two whole years!

In the meantime, GHC 9.0 came out. Then GHC 9.0.2 came out (my official Time To Migrate The Work Repo marker), but 9.2.1 had already been released a few months back. So I started prepping the work repo for a 9.2 migration, skipping 9.0 entirely.

Here we propose to formalize a rather long-standing pattern in GHC's release history, specifying a "bimodal" release cycle and giving concrete support windows for these releases.

So, in 2017, GHC decided to start major releases every 6 months. At this point, every other major release of GHC has been kinda hard to use in practice.

This has caused a problem: namely, we have a bunch of major versions/branches that we're trying to provide some backwards compatibility for. Backporting fixes and features is a huge PITA for this.

My maxim is "The best solution for a problem is to not have the problem to begin with." If we are experiencing a lot of pain and work and frustration from this aspirational 6 month release cycle, why not drop the aspiration?

I'm skeptical that codifying "This major version of GHC won't really be supported long" will help with the problem of slow community adoption. Why bother going through all the work to support GHC 9.N when I know GHC 9.(N+2) will be out in a few months anyway?

Why is this a problem?

Can the software architecture or design decision in GHC itself be improved, such that we can make more new-features minor version bumps, instead of major version bumps?

Can GHC be refactored to be easier to support multiple branches more easily?

Can GHC team push back on breaking changes, and request that they be made in an additive-fashion? And, if they can't be made additive, to include a path forward for the next similar change to be additive and not breaking?

Bodigrim commented 2 years ago

I’m not convinced that GHC users benefit from or even want frequent major releases. Is there data to support this claim? For the record, I would very much prefer 12 months cadence, and AFAIK many commercial users share this view.

Bodigrim commented 2 years ago

I mean, the premise of the proposal seems to be “we cannot have both 6 months release cadence and 18 months of support for each, so let’s agree to cut support for the half of releases early”. But another solution to this problem is to make major releases less frequently, and it is superior from other viewpoints (such as stability and library maintenance) as well.

tomjaguarpaw commented 2 years ago

Has a wider range of options been laid out and discussed somewhere, or is this the only option under consideration? It would be interesting to contrast to options like @Bodigrim's suggestion, or something similar, like yearly releases plus rolling "unstable" releases to get new features quickly into the hands of intrepid early adopters.

santiweight commented 2 years ago

Looking around at the old proposals @bgamari linked to, it appears that long release windows incurs some kind of cost on GHC devs. But I'm not clear on why this is? If GHC devs require join points but cannot guarantee that these join points will be well supported, perhaps GHC should have some internal sense of join points that is not visible to commercial users.

Another point to make is that the naming scheme is here is imo quite poor. Intuitively, 9.0 would appear to be a long-term support release since it is major version bump. But actually the first release of a major version is Why not call these staging versions something like 9.0-RC1 (previously 9.0), since these release candidates are going to be deprecated just months after 9.0 (previously 9.2) comes out.

Having the current naming scheme is causing the Haskell ecosystem to support twice the number of GHC versions, half of which aren't even supported by GHC. I don't think that's sustainable in the long term at all...

gbaz commented 2 years ago

I think this proposal seems independent from the concerns about release cadence? It is clarifying an existing practice of certain releases being the longer-term stable ones, which is a practice that has occurred both when releases were over a year apart and when releases were under a year apart.

I would hate for this clarification to run aground on the basis that it does not address an entirely different set of concerns.

hasufell commented 2 years ago

I think this proposal seems independent from the concerns about release cadence?

I disagree. The proposal explicitly mentions release cadence:

While frequent releases helps to deliver new language and compiler features into users' hands quickly, high frequency comes at the expense of making work for packagers and commercial users who want longer support windows. Moreover, the benefits of frequent releases must be weighed against the considerable fixed cost inherent in making a GHC release.

And the problem it's supposed to resolve is about the churn that the newly introduced release cadence has caused:

While all users have expressed a desire to have predictable time-based releases, many commercial users, and library authors, have expressed concern that they cannot keep pace with a six-month major release cadence. In addition, GHC developers have been straining to keep pace with backports given that there have, until recently, been three active release series

And the natural counter argument was:

we don't have sufficient proof that the community really desires a 6-month release cadence
there might be less of a need to introduce LTS releases if the cadence for major releases is 12-24 months

gbaz commented 2 years ago

The proposal doesn't in my mind solidify any further a release cadence. As it notes, there has already been a "tik tok" schedule informally even when releases were less frequent. It just motivates formalizing this further in part because of the increased cadence.

I don't see how accepting this proposal would make it any easier or harder to change the release cadence than otherwise. If there's a section in the proposal that appears to enforce any cadence, I think it should be changed! Otherwise, I don't think the very valid concerns in all directions about release pace are relevant to codifying the existing practice of having longer-term stable releases vs. otherwise.

And that said, I really would dislike making only the "intended" longer term releases "real" and keeping the others as experimental candidates.

There are certainly plenty of production users who do want to get their hands on bleeding edge changes and features to explore them. If these only exist in rcs where the ecosystem never adopts, that prevents them from doing so, and prevents the ecosystem from giving these "shorter window" releases a full workout -- meaning that issues with them won't get discovered during their lifecycle due to low adoption, which overall will hurt the ghc lifecycle, imho.

santiweight commented 2 years ago

@gbaz I don't think people are complaining about the clarification. I personally think that the proposal clarifies a process that is itself bad. As someone said above: "don't fix problems that shouldn't exist".

If these only exist in rcs where the ecosystem never adopts, that prevents them from doing so, and prevents the ecosystem from giving these "shorter window" releases a full workout

This is an odd sentiment to me. The whole point is that GHC currently is unclear: the non-LTS releases are release candidates. Look at the 9.2.3 bug fixes, which include a Windows linker bug, and that's for an LTS release.

I don't see why users wouldn't be allowed to use release candidates in cabal/hackage as ghc-9.0.1_RC1 (look at Scala for example). The point is: if something isn't ready, tell the user. Having a release candidate disguised as a release is not clear communication to production users, loses trust and causes hassle.

Ben's proposal attempts to clarify this situation by announcing that every other release is non-LTS. But when a non-HFTP-reading GHC user (which is many production users) sees a 9.X release on ghcup, they are going to assume it is ready to use. That is simply how it looks optically: 9.X looks like a real release. If a release is not ready for production, that should be abundantly clear in the release's name.

Say that this proposal is accepted, and the alternating even-versions policy is made loud and clear today. Then that knowledge will become folklore (because it is not nominally apparent), and another one of those "things you have to know" to use Haskell in production. That sounds quite unpleasant!

gbaz commented 2 years ago

This proposal nowhere says that a 9.X release is not ready to use. Every official use should be ready to use! However that "should" conceals a difficult fact, which is that there are always bugs in official releases of basically every software project, and so "only make bug free releases official" is not a valid release criteria. I worked at a company once where the execs ran the metrics and discovered that a lot of time was spent fixing bugs, and they presented to a room of some hundred-plus engineers that they proposed we stop writing bugs. Those executives didn't last very long.

In any case, as I understand it, the proposal is suggesting that some ghc releases have longer support windows than others. A non lts-release is not a non supported release, nor a known-busted release. It is just a release that will stop receiving bugfix backports and point releases sooner than an lts release.

If you believe this proposal is suggesting otherwise, please cite where in the proposal it says so, otherwise I feel like I am discussing the actual proposal, and you are discussing an imaginary (bad) one.

santiweight commented 2 years ago

I agree that my example of the bugs is misleading. Let me more clear. It's more about how support windows are advertised in releases' names...

For me, a short-support-window release candidate is not a release at all. Take a look at the timeline of Scala 3's RC's. There were some features even being added during this stage. I found that the RC system provided a good way to understand what the intention behind each release is, and allowed the ecosystem to drop support for temporary releases in a manageable and obvious way.

What I am getting at is that 3-month support releases add to confusion and are unhelpful to commercial users and library maintainers who have to migrate to releases that will die moments later, without being explicitly told so. If such releases are important to GHC devs - I think this should happen internally, or that the names of these releases should indicate their support window.

Edit: perhaps this is a quibble over "release candidate"? I just want a name on the release to make it clear!

santiweight commented 2 years ago

This proposal nowhere says that a 9.X release is not ready to use. Every official use should be ready to use!

I wonder if this is part of the confusion. I don't personally think a non-LTS release is "ready to use". A GHC release should be well-maintained over a period of time, not an artifact of GHC's development process.

Ericson2314 commented 2 years ago

I also think this is not a good idea. It feels that instead of critiquing our existing ways of working, we are papering over them. I think we only need a new cadence policy, which is releasing more often; I think in struggling to get more releases out, we will diagnosis and fix the actual issues.

(To be clear, my understanding of the status quo is that we are in fact releasing as often as we can support; we are like an injured runner again and again limping across the finish line. Continuing with that analogy, we need to pause, heal properly, and then run healthily and efficiently.)

I think the already accepted additional GHC devops engineer & GHC.X.Hackage idea are good enough first steps to improve the process, so simply trying to increase the release cadence without first pre-identifying issues we think we will run into will be enough.

hasufell commented 2 years ago

This proposal nowhere says that a 9.X release is not ready to use. Every official use should be ready to use! However that "should" conceals a difficult fact, which is that there are always bugs in official releases of basically every software project, and so "only make bug free releases official" is not a valid release criteria. I worked at a company once where the execs ran the metrics and discovered that a lot of time was spent fixing bugs, and they presented to a room of some hundred-plus engineers that they proposed we stop writing bugs. Those executives didn't last very long.

A non-LTS release is absolutely not ready to use for production. As the proposal says.

Intermediate releases, which will continue to have critical backports only until the next release

This means you may be running a buggy version with no backports to come and may not even know, unless you're aware of the intricacies of GHCs version scheme (which already is complex).

ghcup will have to mark these releases as expermintal/discontinued or otherwise, but I'm not convinced this is a good enough strategy for all users. E.g. I don't know what stackage maintainers will do here. Will they skip these releases? Here are some relevant tickets about these concerns that I raised earlier:

So this definitely is a communication issue. I'd like to understand more about how this is communicated to end-users, e.g. on https://www.haskell.org/ghc/

In any case, as I understand it, the proposal is suggesting that some ghc releases have longer support windows than others. A non lts-release is not a non supported release, nor a known-busted release. It is just a release that will stop receiving bugfix backports and point releases sooner than an lts release.

Yeah, but I think it's a valid counter-argument to question the status-quo on this proposal, because it opens doors to an alternative solution.

bgamari commented 2 years ago

Thank you all for your feedback! I would like to emphasize his proposal is truly intended to be a request for feedback; when we instituted the six-month release cycle the GHC Proposals process was still being formed and therefore we had few avenues for soliciting broad, unbiased feedback regarding our release schedule. However, with the recent up-tick in communication within the community, we felt it was time to solicit proper feedback on our release policies and this proposal is our way of starting that discussion.

Before I address any specific points, I'd like to take a moment to review the motivations for the 2017 change. The impetus for reviewing our release schedule came from (primarily industrial) users' concerns that releases were too temporally unpredictable. This lead us to the conclusion that we should switch to a time-based release cycle which naturally presented the question of which period should be followed. Ideally we wanted a divisor of one year to ensure that the release fell consistently in the same part of the year. This left us with the options of one-year and six-months, as anything faster we felt would be untenable both for users and GHC maintainers alike.

From the perspective of a GHC maintainer, a one-year cycle has a few risks:

backports tend to grow in difficulty superlinearly in time
releases after one year of merging to be quite large, which increases the likelihood that regressions are unnoticed until late in the release cycle
new features cannot be adopted by users until up to a full year after they are merged, which can adversely affect contributor motivation

This, coupled with verbal feedback from an sample of users caused us to lean towards a six-month cycle.

However, this lead to the problem that commercial users are unclear which release they should "bet" on when planning their migrations. With the resources we have today, maintaining more than two stable branches at a time is challenging at best. To alleviate this, we are proposing here to offer "asymmetric" releases: some which are guaranteed to have a long backport window, and others which have today's six-month window.

However, I would be happy to discuss other options entirely, including a slower release cadence.

@tomjaguarpaw asks,

Has a wider range of options been laid out and discussed somewhere, or is this the only option under consideration? It would be interesting to contrast to options like @Bodigrim's suggestion, or something similar, like yearly releases plus rolling "unstable" releases to get new features quickly into the hands of intrepid early adopters.

No, there is an extremely large design space and I have made no effort to summarize even a small fraction of it. This is merely one possible design that I am presenting in the spirit of discussion. If others have concrete options that they would like to see mentioned, please do open a merge request against my branch (or perhaps open another proposal).

@maerward says,

A non-LTS release is absolutely not ready to use for production.

I strongly disagree. In fact, the non-LTS releases are essentially what we have today: the GHC team makes a release which we believe is bug-free and we then make a best-effort attempt to backport further fixes until the next release is out (generally around six months later).

The point of this proposal is to offer another release series with strictly stronger support guarantees. That is, for some releases we promise to maintain a full 18-months of backports. This is something that we have never done in the past.

@Ericson2314 says,

I also think this is not a good idea. It feels that instead of critiquing our existing ways of working, we are papering over them. I think we only need a new cadence policy, which is releasing more often; I think in struggling to get more releases out, we will diagnosis and fix the actual issues.

It is not entirely clear to me what you mean by "releasing more often". Do you mean we need more frequent (potentially breaking) major releases, more minor releases, or more of both? It's not clear to me what problem any of these would solve.

Rust does manage a six-week release cadence well, but I believe this only works for them since they have a much different approach to interface and language stability than us. Unless we significantly reduce the number of breaking changes we make, I am skeptical that anything of this nature would work for Haskell without considerable pain.

gbaz commented 2 years ago

Coming off a first pass HFTT discussion and the above useful comment I would suggest this proposal be rewritten to instead of discussing "release cycle" be a proposal for designating certain branches as LTS with promises of a longer range of bugfix backport support. Such a rewritten proposal would also need to motivate why such branches are needed -- i.e. constraints besides ecosystem readiness and code migration (in particular, conservatism at large companies and the need to thoroughly test stability, etc) that would prevent people from just "always trying to be on the latest". Those of us who have worked at some shops know why this is necessary, but not everyone has been in such situations.

I also think that there's evidently sentiment for a fuller release timetable discussion -- I'm not sure how to structure that! Maybe nine month cycles are ok after all? Maybe six month cycles, but with more actual "stability" guarantees in alternate releases? (I.e. the much discussed alternation between feature and polish releases?) I'm not sure if the actual current ghc dev process including branch management is conducive to that... I feel like lots of these questions need lots of input from ghc contributors who are familiar with all the various constraints...

Ericson2314 commented 2 years ago

I agree with more emphasis on the why. We're still figuring out to what extent this HFTT should be involved on how the sausage is made, but the big picture

In particular, conservatism at large companies and the need to thoroughly test stability, etc

So the way @gbaz elaborated this to me was that there is a significant industrial users who simply can't upgrade GHCs that often because of their own rigorous QA process --- that is, even if GHC never had any intentional breakage and tried to by like a Rustc, they would still only be able to upgrade on the cadence of the LTS releases.

This is news to me because mostly what you hear people talking about GHC breakage preventing upgrades, not their own bussiness needs preventing upgrades. (Of course, it makes sense that there would be a bias in that the people blocked externally rather than internally have a lot more reason to speak up!)

To the extent that this is the case, I think it makes sense make this the main part of the motivation. I still remember extremely optimistic GHC can be both a better vehicle for research and one with longer deprecation cycles (no surprise breakage), and it is easy to see something like this coming from a pessimism that today's amount of unplanned breakage is unavoidable and thus we are doomed to lots of back ports. The company QA process reason however has no pessimistic assumptions --- the QA is hard because it is defending against implementation changes, not interface changes. And indeed a world where we feel comfortable refactoring implementations behind the scenes confident we are keeping interfaces stable ---- a world where "low stakes" users can always upgrade but "high stakes" users will still have reason to be cautious ---- is a world I want.

Ericson2314 commented 2 years ago

It is not entirely clear to me what you mean by "releasing more often". Do you mean we need more frequent (potentially breaking) major releases, more minor releases, or more of both? It's not clear to me what problem any of these would solve.

Well some of the confusion around this is I think part of the problem! Right now we have

Same.Same.Change releases, which are smooth but also very limited to only backports.
Same.Change.1 and Change.1.0 releases, which are from master but often quite disruptive, e.g. coming with a breaking base and other boot library releases.

We certainly don't want to e.g. break base more often if we were releasing more often, nor would we want to be stuck with the chore of endless backport releases, so we would be forced to come up with new workflows. This is good!

I would hope we can reach a point where, following what @Gabriella439 has talked about for libraries, breaking changes would be "spread out", so we we would purposely stagger the end of deprecation cycles so they would fall in their "own" release whenever possible.

Rust does manage a six-week release cadence well, but I believe this only works for them since they have a much different approach to interface and language stability than us. Unless we significantly reduce the number of breaking changes we make, I am skeptical that anything of this nature would work for Haskell without considerable pain.

I am a big-time optimist with this stuff. Yes, Rust has big stability promises, but they have done complex breaking changes with e.g. non-lexical lifetimes, and perhaps other stuff still in the works. I think we can get most of their benefits without ceasing to be a vehicle of research (which would we sad!) but by having longer deprecation cycles, like they do.

Put another way, the latency of breakage should be much greater (opt ins -> warning if no opt-in -> opt out -> can't opt out), but the thoughput of breakage need not be any less.

Again, to reiterate my previous post, to the extent this proposal is narrowly targeted at industrial users that need to move slowly no matter what, I support it. But if the motivation moves beyond that into what I view as pessimistic assumptions about an industry--research tradeoff being inevitable, I am not longer interested. I rather come together as a community to draft some big picture problem statements / agreed vision first before getting to individual solutions. To skip that and go straight to specific solutions feels like putting the cart before the horse.

(Finally, I am sympathetic that the current HFTT process is kind unclear and and still be working out right now, so I don't mean to say it was "objectively wrong" to right a detail-oriented proposal taking a stab at some larger problems to and get the ball rolling. This is just my opinion.)

Gabriella439 commented 2 years ago

Releasing less frequently doesn't necessarily improve quality. From my point of view, the only thing that improves quality is the proportion of developer time spent on changes more likely to break the code (e.g. new features that are potentially disruptive) versus changes more likely to fix the code (e.g. paying down technical debt).

In particular, suppose you release less frequently by decreasing the release frequency from 6 months to 1 year. You can't guarantee that the extra 6 months you added to the release cycle are spent increasing quality: the extra time could be spent breaking GHC more unless there is organizational discipline to only spend that extra time to pay down technical debt.

If the old policy worked before the switch to the 6 month cadence (and I'm not confirming whether it did or did not; I haven't reviewed the evidence), it seems to me more likely that the reason for the improved quality back then would due to the dynamic release cadence rather than the infrequent release cadence. In particular, contributors to GHC would be sensitive to which periods of development activity had improved quality (e.g. lulls in new features) and would identify those as ideal points to cut releases.

However, from my point of view that points to the real problem, which is that most of the time the rate of breaking changes being introduced to GHC exceeded the rate of changes that fix GHC. In other words, the dynamic release cadence was so infrequent because only a small proportion of the time was the quality improving.

That implies that if you switch to a fixed 1 year release cadence you wouldn't necessarily improve quality because you can't guarantee that the 1 year release point happens during one of these rare windows of improving quality.

In my view, the actual way to improve the quality of releases is to increase the proportion of developer time spent on quality improvements, which could be done with a combination of:

Saying no to more GHC proposals (especially the potentially breaking ones), or at least spacing them out over time by establishing an informal "change budget" for GHC
Dedicating paid full-time developers to quality improvements (including, but not limited to, the GHC DevOps engineer)

With a fixed release cycle, if a greater proportion of the time is spent increasing quality then you have a greater likelihood of any given release being a high quality release (due to a greater likelihood of falling within a window of improving quality)

With a dynamic release cycle, if a greater proportion of time is spent increasing quality then you have a greater frequency of releases.

Ericson2314 commented 2 years ago

Another thing is is that in my experience it is easier to cherry-pick iteratively than in one big leap. Maybe actually releasing the non-LTSes is not with the effort, but I rather cherry-pick through each of them so I face fewer conflicts at all once. (That's a non-linear improvement, given Git's weaknesses as conflicts becomes more severe.)

Given that, I am not so sure only supporting LTSes will actually be much easier! I'd be interested in hearing what sort of features are backported today and how people do it.

angerman commented 2 years ago

I'm not sure this is what I had in mind when I talked about tick tock cycles in the early HF tech calls. Or maybe I've just changed too much since then?

The problem I see from a commercial application of haskell is that you simply cannot upgrade to a newer compiler. You are almost guaranteed that your code will no longer compile. Maybe the code you wrote yourself will technically. The full transitive closure of dependencies though won't. It's the same reason why we need ghc.x.hackage. We often times can't test new releases against a large set of packages, because the surface language keeps changing.

We used to have 12mo cycles that broke the surface language, and some form of backporting in between. We tried to get out of the backporting moat by shortening release cycles. We however kept the surface level breakage.

If ghc releases would just continue to compile existing codebases just fine, upgrading to a new ghc would be much less of an issue. You could just swap in a new GHC, compile the existing codebase...

By virtue of GHC being a research compiler we see a lot of great things being thought up and implemented in GHC, that ultimately end up breaking existing code. From a purely getting-shit-done perspective, I don't care about most of that though. I do care about ongoing adaptation of the compiler to new/changing operating systems, toolchains, processors, ...

My inItial hope for the tick-tock cycles was to slow down breakage and focus for at least half a year on exclusively making GHC more stable, faster, ..., adapted to new platforms, cruft cleanup, ... This would lead to a fairly trivial upgrade every other release. You basically know you are just getting the same GHC you've been using all along, but with a significantly improved engine.

Maybe we can not break GHCs surface level language (the programs GHC accepts) for the next 4-6 releases and put any new surface changes exclusively behind feature flags? That would allow language researchers to experiment with new features, and eventually migrate those feature flags into GHC? We have this to some degree in the form of language pragmas, but not for e.g. the subsumption change recently.

My focus with this has always been on the: keep the accepted haskell code stable while allowing to significantly improve GHC itself. I fully accept that I'm probably biased here. From my current roles perspective, I'm fairly certain we primarily want a better compiler (faster, more platforms, more control over linking, better HLS, ...) that just compiled the exact same code it compiles today.

Bodigrim commented 2 years ago

+100 to @angerman

Wrt to language researchers, I’m not convinced they really need their stuff released often. Merged into the main branch - yes, likely, but afterwards they can use nightly builds without interfering with the wider community.

hasufell commented 2 years ago

However, from my point of view that points to the real problem, which is that most of the time the rate of breaking changes being introduced to GHC exceeded the rate of changes that fix GHC

I think that's a very interesting take. However, from my experience the 3 most recent disruptive changes to GHC that affected a lot of user experience were:

Darwin M1 support: a huge milestone, but new platforms naturally have a lot of bugs that can take years to iron out
Move from make to hadrian: important for GHC devs, but honestly caused a lot of issues for bindists, distributors and casual users trying to compile from source
Simplified subsumption

Now, I'm not sure if any of these really count as features. And I think only point 3 could have been avoided.

Additionally, fixing things will also lead to stuff breaking. The extremely important work about modularizing GHC will, imo, for sure introduce bugs. Any type of refactoring will.

I'm rather reluctant about drawing clear lines between "fixing" and "features" or attributing instability primarily to new features.

However I agree that this could be framed as a priority issue. But it might not always be clear what exactly will cause periods of instability.

Wrt release cadence:

What is clear to me is that more frequent releases put a lot of work on distributors and other tooling: ghcup, stackage, HLS etc.. At the same time it's valuable feedback for GHC developers.

I think the main issue here is really that testing new GHC versions requires so much work due to the coupling with base and interface changes. The only way to instigate testing then is to just release and wait for adoption. If this was less of a problem (e.g. because we decoupled GHC and base or managed to reduce excessive interface changes) and installing new versions was easier via e.g. a nightly channel, then there would be less release pressure.

But I'm not sure how well this would work wrt backporting load.

hasufell commented 2 years ago

We often times can't test new releases against a large set of packages, because the surface language keeps changing.

Just read @angerman comment and realize we're saying the same thing, basically.

Yes, testing new GHC versions must become easier. And the release schedule might just be a secondary problem.

Ericson2314 commented 2 years ago

So what @angerman said makes sense to me, but is quite different from what this proposal says:

The proposal is what happens after a release: what ends up as backports.
What @angerman says is about what happens before a release, the pace of breakage.

This confirms my suspicion that "tick tock" is, frankly, too much an overloaded label at this point, that means different things to different people, and we are better off not using it so it is clearer what is being proposed.

AllanKinnaird commented 2 years ago

if I might leave some mild thoughts: It seems to me that anyone engaged in serious commercial development will want change only when absolutely necessary. The fixing of bugs blocking further development or a major improvement in usability are clear signals. But nobody wants to imperil the stability of an existing project unnecessarily. Then, to develop the language, we need easy rapid changes to test possibilities. These two approaches are difficult to reconcile. As a relative newcomer to the Haskell community, it seems to me that there are two cliques - the commercial and the academic. I want to belong to neither and both. The commercial cannot survive without the intellectual academic source of the language. The academic will wither without the commercial support of the source of funding.. The road to bringing Haskell to a universal modern language must involve bringing these two arms together, and it is such a beautiful language, and so much fun to write, that surely that has to happen! Am I wrong?

david-christiansen commented 2 years ago

Here's my sense of the overall state of the discussion, for the sake of summary and to get us all on the same page.

Problem Statement

There seems to be widespread agreement that compiler updates roughly twice per year are difficult to keep up with, as the proposal describes. There seem to be two sources of difficulty:

GHC and base releases are typically not entirely backwards compatible, necessitating updates to both ones own code and to library dependencies.
Conservative users have testing and validation requirements for new releases that can be expensive and time-consuming. The fixed cadence is important because it allows people to plan for these costs, but it seems that even though the costs are predictable, it would be nice if they were also lower, at least for some users.

Today, the GHC team only promises to backport fixes until the next major release. In practice, it seems that some major releases continue to receive support for a longer period of time, and which major releases are considered stable and supported is something that experience community members typically know, but it seems to emerge organically rather than being planned for and explicit.

Potential Solutions

It seems to me that three broad categories of solutions to this problem have been discussed in the thread:

Tick-Tock (this proposal)

This proposal suggests that every other compiler release might have a longer support window, allowing conservative users to have a 12-month release cadence with backports available for 18 months from release, while more risk-tolerant users have access to an additional release in between. Not only does this make features available a bit earlier, it also allows some parts of the community to start the process of updating code more quickly.

This is a strict improvement over the situation today, because today all releases are short-term support, with no promises to back-port fixes. In practice, some releases do seem to be more widely deployed and to receive fixes for longer - see 8.8 vs 8.10, for instance, but this is a bonus rather than the fulfillment of a promise.

On the other hand, under this proposal the version numbering scheme doesn't clearly communicate which releases are LTS, which is perhaps something to improve in a revised proposal. Additionally, it's not particularly clear what the relationship between the version number, the stability of the implemented language, and the stability of the implementation itself are.

Slow down

Have a slower, yet still fixed, release cadence. Assuming a constant rate of change during development, this would lead to larger, yet less frequent, porting efforts among users of Haskell. This seems to me to be roughly equivalent to option 1, just with omitted short-term-support releases.

Additionally, there has been discussion of a point in between "Tick-Tock" and "Slow down", in which the intermediate releases do appear, but are clearly marked as being development releases, experimental pre-releases, or milestones, or something like that, thus communicating that conservative users should avoid them but allowing risk-tolerant users to both provide feedback and begin the process of updating code.

Backwards compatibility

An alternative to making it easier to skip releases (and skip the right releases), an alternative approach to solving the problem could focus on improving the backwards compatibility of GHC, to make new releases less of an effort for everyone. If code just worked with new compiler versions, then one of the major sources of upgrade costs would disappear.

At a first glance, this seems orthogonal to the other options being discussed. However, because these options are about managing upgrade costs, decreasing said costs could change the cost-benefit analysis on the others, so it seems worth keeping in mind here.

Have I missed any important aspects in this summary?

angerman commented 2 years ago

@Ericson2314 That is an interesting observation. I think it's more that my focus is on what my perceived challenges are that I hope a tick-tock cycle would address, whereas this proposal focuses on the how it want's to address this. I don't think the proposal focuses enough on not breaking the surface level language.

@AllanKinnaird your observation of the academic and commercial split in the haskell ecosystem is quite right. There certainly is that. Commercial applicant have to work with the tools available today; and they also don't derive any noticeable value from code churn. It costs (non-trivial) time to upgrade and brings no discernible benefits. Training folks on new GHC features also takes time. At the same time knowing in a commercial setting which features to use for which specific problem is a skill you acquire over a long period of time. The flexibility of the language means you'll need to make tradeoffs. Do you use monad transformers? And effect system? Type Families and Data Families? Do you want to use the lens library, and thus need everyone to be comfortable with these features?

I do not want to stop language exploration. I just want to put a significantly larger emphasis on ensuring that the surface level language stays the same (unless feature flags are provided). This will also greatly benefit the maintenance of GHC as there will be less churn all the time. It would allow us to have nightlies that can compile code we can compile today. We can thus also see if the compiler broke in other unexpected ways.

My view on @hasufell's three points is as follow:

yes we needed aarch64 support in GHC, to keep it a relevant compiler. We did have some aarch64 support via the llvm backend, but that was slow and not well tested. Adding aarch64-darwin, made us trip over some shortcuts GHC had taken in the past wrt to subword types and c calling conventions. So this had to be retrofitted. The fact that GHC 8.10.7 can build for aarch64-darwin just fine with absolutely minimal surface level changes (you really just need to be correct about the argument size to C calls, and if you are not, things will break--that they didn't break before was just sheer luck!), should show that this change wasn't even a surface level change. The fact that doing subword support right also allowed us to remove other warts is what gave us a bit of churn at the low level; this (Subword array ops, ...) however was the result of taking advantage of the new subword support in GHC. It was enabled by, but not mandated by the new codegen.
Was the move from make to hadrian necessary? That's probably a topic to be debated. I still believe we should eventually just use cabal, and hadrian (to me) is just a vehicle on our journey there. It basically does what cabal does, but improves a few things to fit the needs of GHC more (and allow cross package parallelism). Again a change, which slowly pushes us to do the right thing (isolate packages). I truly believe GHC should be built using the haskell build tool, not being able to build GHC with it, is an acknowledgement of it not being good enough. We should fix cabal, instead of re-inventing it! The make system was just untenable to keep up with. Decades of cruft layered on top of each other; could we have started afresh with autotools and written a workable buildsystem? Probably. Same would hold likely true for any other build system we could have used. The fact that each buildsystem would somehow need to either be able to read .cabal files or replicate them, or ghc libraries not using .cabal file but the build systems package description is where the biggest contention would be. While annoying to adapt to hadrian, it has no surface level changes, and for the casual haskell user is a completely invisible change. I'm not claiming it has no effects on packaging and distribution. But from an end user perspective, we still have binary distributions that install pretty much as before.
subsumptions is the most recent one that made GHC reject programs it previously accepted.

So, while I think exploiting the new subword support was certainly something of value, it's a form of breakage, I'd prefer we find alternative ways around. Hadrian to me is something that's completely in the GHC devs ballpack and (again) has no visible effects on the end user (other than bugs?).

So, for a tick-tock approach, I'd have liked to see aarch64 support (without the subword library changes, hadrian, darwin linker fixes, performance improvements, ...) in the intermediate release (tock). Everyone from the previous release would have been able to upgrade to that version, get the improvements without their code breaking at all. The next tick cycle could have added the subword library changes, subsumptions (ideally behind a flag), ... changes.

However, due to the high rate of breakage and churn in the past, I'd like to see only tock cycles for the next 2-3 years. This could also amount to muted tick cycles that simply add functionality behind feature flags only, such that we have 2-3 years of "if it compiled then, it compiles now".

angerman commented 2 years ago

One more thing after I wrote the last message, that might help understand my view on this. I think there are fundamentally two types of people working on GHC (some are in both camps). Language Researchers and Systems Engineers. You could think of tick-tock as giving each their own release. The language researchers releases (tick) could break the surface language, the systems engineers releases (tock) would never break the surface language but equip GHC with a better (faster, more versatile) engine. As an industrial applicant of haskell, you could always upgrade to the tock releases, and just get a better compiler, but would need to be a bit more careful around tick releases if they break the surface level language. Tock release should also be able to be continuously tested against a large body of existing haskell code. After all they don't break the language. There is some similarity to backports here; it's just that backports are harder, and e.g. back porting just the NCG from 9.2 to 8.10 is significant work as the GHC codebase changed in between. Holding off on breaking the surface level language seems easier--to me.

tomjaguarpaw commented 2 years ago

I'd like to see only tock cycles for the next 2-3 years. This could also amount to muted tick cycles that simply add functionality behind feature flags only, such that we have 2-3 years of "if it compiled then, it compiles now".

I'd just like to pull this snippet out because I think it deserves serious consideration.

Ericson2314 commented 2 years ago

Conservative users have testing and validation requirements for new releases that can be expensive and time-consuming. The fixed cadence is important because it allows people to plan for these costs, but it seems that even though the costs are predictable, it would be nice if they were also lower, at least for some users.

We discussed this in the meeting, but honestly I am still skeptical. Is there any way we could have some some poll or ask key companies that might miss the poll? I would propose the following.

Suppose GHC didn't have intentional breaking changes. So no features would be landed knowing they broke code, and increased testing (think more head.hackage, more "whole-ecosystem CI") would be done to check this, but of course mistakes can and will still happen.

How often would you upgrade? Would that be more often than doay?

If that cadence is longer than GHC's release process, would you skip releases to stay caught up?

If a new GHC had a bugfix you suddenly needed, would you rather accelerate your planned upgrade cycle to get the most recent stable version with the bug fix (where we assume it's back-ported), or backport it further to the release you are currently on?

If you are skipping releases per the earlier question on cadence, would it be useful if other industrial users skipped the same releases as you?

Notice I tried to avoid any jargon words like "LTS" "tick tock" etc. to try to avoid knee-jerk reactions one way or the other. The answers to these would be immensely clarifying to me.

david-christiansen commented 2 years ago

I can reach out to industrial Haskell users with those questions on Monday.

Ericson2314 commented 2 years ago

Awesome! Thank you @david-christiansen!!

gbaz commented 2 years ago

Have I missed any important aspects in this summary?

@david-christiansen I think you need to disentangle the two different uses of 'tick-tock" -- with the other being what @angerman has suggested. This proposal is "alternating LTS" and the other idea would be "alternating language and systems releases".

I will note that in my experience it is systems-y changes that have caused as much difficulty as language-y changes in migrating industrial projects to new GHCs. In particular, subtle changes to garbage collection, inlining semantics or treatment of rewrite rules, or perhaps other rts behaviors or linking behaviors can cause code which worked well before to degrade in performance at runtime, or to tickle issues with concurrency interacting with linked C libs, etc. Often it is not necessarily the case that this is even a regression. The old code often "worked ok" almost accidentally. But nonetheless, getting code to compile with changes to base, etc. is the tedious part at times, but not the hard part, which is validating performance and reliability at scale over a long span of time, and then diagnosing the subtle issues that arise when that validation fails.

I do also want to note that angerman's comments in particular are useful in pointing at external constraints on ghc releases that are semi-uncorrelated to either language or systems features. Apple keeps changing os apis, its distros of compilers and linkers, its security conventions, and sometimes full architectures. Windows is now in the habit of making breaking changes with new os versions as well. So even if we did not have any language or systems improvements, there would be a certain pace of releases necessary to just keep up with the underlying os- and architecture-induced churn such that ghc could keep working well on the current generation of machines and oses.

Ericson2314 commented 2 years ago

@gbaz

@david-christiansen I think you need to disentangle the two different uses of 'tick-tock" -- with the other being what @angerman has suggested. This proposal is "alternating LTS" and the other idea would be "alternating language and systems releases".

FWIW, I was my understanding that was that each bullet in the problem section was supposed to call out one of those uses of "tick-tock".

I do also want to note that angerman's comments in particular are useful in pointing at external constraints on ghc releases that are semi-uncorrelated to either language or systems features. Apple keeps changing os apis, its distros of compilers and linkers, its security conventions, and sometimes full architectures. Windows is now in the habit of making breaking changes with new os versions as well. So even if we did not have any language or systems improvements, there would be a certain pace of releases necessary to just keep up with the underlying os- and architecture-induced churn such that ghc could keep working well on the current generation of machines and OSes.

I agree underlying OS stability is a huge issue --- if nothing else it has caused more suffering for @bgamari than most of us can imagine. But I suspect we may not have the same ramifications of this fact in mind.

If the OS is so unstable, why are users upgrading them? A big lesson of the macOS struggles in particular is the Apple release engineering is not even good. On hand, they consciously make some breaking changes, and that is Apple's prerogative. On the other hand, there is tons of issues that don't seem intentional but simply because Apple does not care, and those issues slip through QA.

I thus posit anyone that anyone who is intentionally holding back for extra stability also is on e.g. a red-hat LTS, that they are trying to keep their entire system stable and well-tested, not just the Haskell tip of the iceberg.

Conversely, anyone that is keeping up with macOS and Windows releases either thinks the benefits outweigh the costs, or feels that have no choice. They are prioritizing other things over stability.

Correct me if I wrong, but I assume one of the main categories of backports LTSse were intended to receive are platform support back-ports. But according to the above analysis, if GHC isn't making heavy intentional breaking changes, it is precisely the the users on the LTS that shouldn't need new platform support!

Also, so long as the underlying systems are increasingly unstable (and I don't see any sign of Apple or Microsoft retreating from "move fast and break things"), it is in our interest to support as few versions as possible. To the extent the underlying system is unstable it behooves us to be more stable.

angerman commented 2 years ago

@gbaz just to clarify, I did not intent to say that system changes are bugfree. If that came across as such, I'm sorry; that's not what I meant. I agree that there can be complications with systems changes. However if the surface level language does not change you can just revert back to the previous compiler. This is much less involved from an operations standpoint than migrating a complete codebase over. The latter one involved man hours of manual build/fix cycles. The former ideally requires only dropping in a new binary, rebuilding the code and running all test/regression suites. For a massive codebase this is still something a single engineer can do in a day or two (depending on compilation time and test/regression suites). Migrating a major codebase where parts of the codebase are not even known to every engineer is significantly more time consuming.

I do like your delineation. Let's call this proposal LTS release proposal and discuss tick-tock as a separate one.

One thing to ponder though: what is the difference between backporting system changes into an LTS compared to a separate release that contains only those system changes? Wouldn't the LTS backport suffer from the same system-y breakage?

hasufell commented 2 years ago

I agree that there can be complications with systems changes

And I want to clarify that I didn't mean that M1 support brought accidental or unwelcoming breaking changes. I was just challenging the "feature vs fixing" semantics brought up by @Gabriella439.

I think with your proposed "surface language changes" vs "system changes" semantics we seem to be getting closer to a point where we can reason and argue about this, because I think the acceptance of disturbance between the two may indeed differ considerably.

juhp commented 2 years ago

To be serious about tick-tock the different major versions need to be clearly distinct.

I would suggest using using odd submajor versions for shorter-lived major versions and even submajor versions for LTS:

eg the short GHC 10.1 major version would have a few minor versions released within a 6 month period leading up to the first long 10.2 version minor release - after 6 months 10.3.1 could appear. (10.1 could be preceded by a long 10.0 major version of course.)

In this way users desiring to change ghc major version less frequently could go from GHC 10.0 to 10.2 and/or 10.4 (skipping the shorter 10.1 and 10.3 versions). Many Linux distributions might also want to do this perhaps: most are already struggling to keep up.

I am also wondering what this could mean for Stackage: should Stackage LTS only track GHC LTS versions? That seems a bit slow, but it could work - meanwhile Nightly could move through all short and long major versions. This would certainly lead to more ecosystem stability, though slow down major development of packages a bit, with a clear stability roadmap.

The "notion" of support here is also confusing - to me it is kind of the opposite of what one normally wants in terms of pushing out software updates. I don't really want to have regular monthly GHC LTS minor releases forever rather the fixpoint should converge on stability sooner than later (unless the continuous ABI/hash changing problem can be solved once and for all).

ps Also compiler/language "LTS" seems very short-lived (months not years) compared to operating systems and other kinds of really long-term supported software.

bgamari commented 2 years ago

I do not want to stop language exploration. I just want to put a significantly larger emphasis on ensuring that the surface level language stays the same (unless feature flags are provided). This will also greatly benefit the maintenance of GHC as there will be less churn all the time. It would allow us to have nightlies that can compile code we can compile today. We can thus also see if the compiler broke in other unexpected ways.

For what it's worth, I think many of us would agree with this. The lack of breakage helps everyone, users and developers alike. However, getting there will require a concerted effort across the ecosystem, from GHC proposals process to the core libraries committee to GHC code reviewers.

So, for a tick-tock approach, I'd have liked to see aarch64 support (without the subword library changes, hadrian, darwin linker fixes, performance improvements, ...) in the intermediate release (tock). Everyone from the previous release would have been able to upgrade to that version, get the improvements without their code breaking at all. The next tick cycle could have added the subword library changes, subsumptions (ideally behind a flag), ... changes.

Agreed. In my mind 8.10 was a prototypical intermediate release. We backported a good number of larger features to it to enable users to continue using it for a long period, even if this required considerable cost from GHC developers. This is the same trade-off this proposal suggests that we make regularly in the future.

To be serious about tick-tock the different major versions need to be clearly distinct.

A fair point. Currently the even/odd distinction is already used to denote master and release-branch compilers, although I suppose we could revisit this.

bgamari commented 2 years ago

Thanks to everyone for their feedback. A quick update on this:

In my view this proposal tries to address a real concern: When we instituted the six-month release cycle we did not have the benefit of the feedback that a formal proposals process provides. We rather designed a time-based cycle that we felt was a reasonable trade-off given the informal feedback that we had heard, in order to address the concern that GHC release timing was too unpredictable. Now that we have a formal mechanism for collecting feedback, I think it is appropriate to revisit this topic and have a more structured discussion on release timing, with the goal of finding a release schedule which works for all of our users.

However, my time to facilitate such a discussion is rather limited until August. Consequently, I plan to revisit this subject later in the summer.

gbaz commented 2 years ago

@bgamari i'm sure you won't get to this before our tech working group thursday, but wanted to give a gentle ping on both this and the head.hackage proposal?

gbaz commented 2 years ago

I look forward to a structured discussion on the various users of GHC and their needs. A little taxonomic device I thought of, which can help to organize impact consideration, is a pair of grids -- one for development, and one for education. Each lets us categorize use-cases along two axes, so we can get eight sorts of cases to consider. Different proposals may be directed to solve specific needs of only one or another quadrant, and can be weighed if their impact is positive, negative, or indifferent among each quadrant. (And people may of course show up as individuals in multiple quadrants, but their needs when they are acting in the capacity in each quadrant is what this helps specify).

       Developer Maintainer
      ┌─────────┬─────────┐
    L │         │         │
    i │         │         │
    b │         │         │
      │         │         │
      ├─────────┼─────────┤
    A │         │         │
    p │         │         │
    p │         │         │
      │         │         │
      └─────────┴─────────┘

People may be developing new code, or maintaining existing code. And they may be acting more as application developers who are solely consumers of libraries, or as library developers.

So I would speculate, for example, as a vast simplification, that developers (lib and app both) tend to benefit more from faster release cycles, so they can immediately make use of new features. Meanwhile, lib maintainers feel more burdened by such cycles, since they need to keep updating their packages to keep up. App maintainers, on the other hand, may, especially in commercial settings, more comfortable remaining on a fixed version of GHC and updating more rarely.

Now we consider the situation for education:

       Teacher   Student
      ┌─────────┬─────────┐
    B │         │         │
    o │         │         │
    o │         │         │
    k │         │         │
      ├─────────┼─────────┤
    C │         │         │
    l │         │         │
    a │         │         │
    s │         │         │
    s └─────────┴─────────┘

As a huge generalization, and purely for example: authors of books (in the teacher/book quadrant), would want slower language evolution, so their books do not go out of date. Teachers of classes might also want to not frequently update their course materials. Students of classes, on the other hand, might be very indifferent to the burden this places on teachers, and would instead appreciate using the "most evolved language" possible.

As a further example, I might consider that the ghc.x.hackage proposal targets lib maintainers, and app developers. In particular it aims to ease the process of lib developers keeping up with new ghc releases (and to alleviate the burden on them by allowing others to more easily "fill in the gaps" for them temporarily), and hence also to make the "full ecosystem" more immediately suitable for active development, which most benefits then app development.

In the particular case of this proposal, I would say it is targeted at the developer grid and not the education grid -- and within that grid is focused on app developers and maintainers. I.e. for people who want to target a given ghc for their applications, or figure out which ghc to stably develop against, having some ghc's as the intended "lts" versions can provide guidance. Meanwhile, the "alternating language and systems release" proposal would more target the lib and app maintainer quadrants -- by providing longer periods of language level stability which alleviates maintenance burdens.

In any case, I only wanted to sketch the taxonomy here and provide some examples of how it might clarify our discussions -- not to pre-empt those discussions themselves, so I'll stop here :-)

hasufell commented 2 years ago

@gbaz I propose (New = New projects, Old = large exsiting codebases)

   Industry Researchers
  ┌─────────┬─────────┐
N │         │         │
e │         │         │
w │         │         │
  │         │         │
  ├─────────┼─────────┤
O │         │         │
l │         │         │
d │         │         │
  │         │         │
  └─────────┴─────────┘

IMO, there's only one quadrant here that really cares about the pace GHC pushes out new language features and that is researchers writing new code.

david-christiansen commented 1 year ago

I greatly appreciate the meta-discussion by @gbaz and @hasufell , and I think we should look into using a technique like this on future proposals.

This particular proposal hasn't seen any responses for some time, so I'm closing it. If anyone would like to bring it back and incorporate feedback and discussions, please do so!