GHC-internal modules in `base`

bgamari commented 1 year ago

(The proposal eventually approved in this thread is https://github.com/haskell/core-libraries-committee/issues/146#issuecomment-1591871779 — Bodigrim, Sep 2023)

1. Background

Currently the base package exposes many internal implementation details of the implementation base functionality. By "internal implementation details" we mean functions and data types that are part of GHC's realisation of some exposed function, but which were never intended to be directly used by clients of the base library. For instance, the GHC.Base.mapFB function is a necessary exposed part of the fusion framework for map but which GHC's authors never intended users to call.

This lack of clarity is bad for several reasons:

Users have no way to know which functions are part of the "intended, stable API" and which are part of the "internal, implementation details". Consequently, they may accidentally rely on the latter; they simply have no way to tell.
GHC's developers are hampered in modifying the implementation because too much is exposed. This imposes a high backward-compatibility burden, one that is an accident of history.

This status quo leaves much to be desired: users tend to rely on any interface available to them and therefore GHC developers are susceptible to breaking users when changing implementation details within base. On the other hand, there is a clear need to be able to iterate on the implementation of GHC and its base library: fixing compiler bugs may require the introduction of new internal yet exposed primitives (c.f. the changes made in the implementation of unsafeCoerce in GHC 9.0) and improving runtime performance may require changes in the types of exposed internal implementation (c.f. GHC #22946).

These difficulties are discussed in CLC #105.

2. Proposal

We propose to classify the modules of base into three groups:

Hidden: these are simply the existing non-exposed modules (other-module in Cabal terms). No change here.
External: these modules comprise the public API of base.
- The GHC team makes strenuous efforts to maintain the stability of this API
- Changes to the API require a major version bump.
- Changes require agreement of the CLC
- Modules appear in the exposed-modules Cabal section
Internal: these modules are part of the internal implementation of base functions.
- The GHC team makes no effort to maintain the stability of this API
- Changes to the API require only a minor version bump.
- Changes do not require agreement of the CLC
- Appear in the exposed-modules Cabal section

As of today, all modules are either Hidden or External; the CLC policy is that the API of all exposed modules is subject to CLC review.

The main payload of this proposal is

To establish the concept of an Internal module (exposed, but internal to GHC)
To outline a process for incrementally moving modules from External to Internal.

2.1 Codifying the Internal vs External split

Our proposal is simply to declare whether a module is Internal or External, using some out-of-band mecanism like a publicly visible list.

However, future reorganizations (notably HF tech propoosal #47) might split base into two packages:

ghc-base, all of whose exposed modules are Internal.
base, all of whose exposed modules are External.

That would codify the distinction between Internal and External, which would be a Good Thing. But the burden of this proposal is simply to make that distinction in the first place, and start a dialogue about which modules belong in each category.

Incidentally, the Stability Haddock field of a module is not the same as Internal vs External distinction. A module could be External (i.e. designed for external callers), and yet experimental and not yet stable. That seems to be the intended purpose of the Stability field, although it is not well describe anywhere (please tell us there is a good specification).

We propose to document internal modules via a yet-to-be-named Haddock field.

2.2 Module by module summary

To make the discussion concrete, we have characterized each of the exposed modules in the GHC.* namespace along three axes:

Proposed status: one of External or Internal
Real-world usage: how many times each module is imported by end-user code. This data was gathered from Hackage.
Stability risk: our subjective judgement about whether the module exposes internal details that may be subject to change in future compiler releases

These findings, along with the stability indicated by the modules' Stability Haddock field, are summarized in this spreadsheet. We then used these assessments to define an action plan (seen in the "Action" column) which will bring us closer to the goal of clearly delineating the stable interface of base. We do not intend to pursue this plan as one atomic change; rather, we intend for this plan to be an aspiration which we will iteratively approach over the course of the coming years, largely driven by the needs of the GHC developers.

The proposed actions fall into a few broad buckets:

Internalize, which denotes the GHC developers' intent to in the future open a CLC proposal to move the module from External to Internal.
Hide, which denotes the GHC developers' intent to in the future open a CLC proposal to remove the module from External to Hidden.
Stabilize, where the module is declared to be stable and reaffirmed as External

In the sections below we will discuss some of the reasoning behind these proposed actions and draw attention to some open questions.

3. The question of `GHC.Exts`

Historically GHC.Exts has been the primary entry-point for users wanting access to all of the primitives that GHC exposes (e.g. primitive types, operations, and other magic). This widely-used module poses a conundrum since, while many of these details are quite stable (e.g. Int#), a few others truly are exposing implementation details which cannot be safely used in a GHC-version-agnostic way (e.g. mkApUpd0#, unpackClosure#, threadStatus#). There are at least two ways by which this might be addressed:

Export only the subset of primops that we can stabilize (e.g. things like Int#, Weak#, newArray#, etc.) in GHC.Exts, leaving the rest to only be exposed via GHC.Prim (which should not be used by end-users), or
Declare the entirety of GHC.Exts to be unstable and export the stable subset from another namespace (e.g. Word# and its operations could be exposed by GHC.Unboxed.Word)

4. Non-normative interfaces

Several interfaces exposed by base intentionally reflect internal details of GHC's implementation and, by their nature, should change to reflect changes in the underlying implementation. Here we call such interfaces "non-normative" as they are defined not by a specification of desired Haskell interfaces but rather by the system that they reflect.

One such module is GHC.Stats, which allows the user to reflect on various statistics about the operation of the runtime system. If the runtime system were to change (e.g. by adding a new phase of garbage collection), users would expect the module to change as well. For this reason, we mark such non-normative interfaces as "internal".

tomjaguarpaw commented 1 year ago

We propose to divide the modules of base into three groups

s/divide/classify/, perhaps? "divide" makes it sound like you're intending to carry out some operation on the source repo, but I think you mean just "publish a classifcation". In the subsequent section you use "declare", and explain that the "division" is done out-of-band. I think it might be wise to foreshadow this by changing "divide" to "classify", to avoid misdirecting the reader.

phadej commented 1 year ago

There are two axis of distinction between external and internal proposed:

Changes do or do not require agreement of the CLC
Changes to the API require a major or a minor version bump.

I stricly disagree with the second point. If change is breaking, than it requires major version bump. I don't care whether the module is marked as "implementation specific internal". If you expose it, than it's part of public API. If you want to fix that issue, introduce a new package (ghc-base) and version it properly. Then if base doesn't re-export affected stuff from ghc-base, it can stay at the same major version.

And as far as I understand, it's the first point which causes most friction. GHC devs need to change something in GHC specific stuff, but have to go through CLC. And that I agree on, the clarification will make everyone's life eaiser.

But I repeat. Do not bundle these two unrelated considerations, they are not related. As an (occasional) user of low-level base stuff, I do care about them being versioned as PVP specifies. Otherwise I'd be forced to use tighter upper bounds on base. If you want to version GHC specifics separately, put them into separate package. Until then, version them properly.

Finally, if it's really internal modules (i.e. possible future ghc-base) maintenance approach is

The GHC team makes no effort to maintain the stability of this API

then whole thing is not worth doing. You probably want to word that differently.

bgamari commented 1 year ago

If change is breaking, than it requires major version bump.

As a matter of principle, I have no objection to continuing to bump the version of base when internal declarations change in a breaking way. This is one reason why I support #145.

However, until this happens (which could happen in time for GHC 9.8, if we move quickly) we will need to be pragmatic (as we have been in the past). Concretely, this means accepting the (small) possibility that breaking changes in internal modules may occur in minor version bumps. Of course, we will do due diligence to minimize the damage, but if there is a change needed in an internal declaration for a soundness fix (or something of similar severity) then we will make the change with a minor bump. Doing otherwise would merely cause undue harm to users for no perceivable benefit.

phadej commented 1 year ago

@bgamari

(as we have been in the past).

Can you provide a short summary of such breaking changes deep in base which should had technically been major bumps but hadn't? I don't remember any in recent years.

EDIT: I know that ghc library doesn't try very hard to be PVP compliant, as a lot of stuff is exported, and there are many fixes and changes in ghc itself between minor versions. But that's another story: it's not base.

bgamari commented 1 year ago

Can you provide a short summary of such breaking changes deep in base which should had technically been major bumps but hadn't? I don't remember any in recent years.

Such cases are (and should be) rare. Producing a comprehensive summary of these changes would be require a fair amount of effort not because there are numerous cases but because they tend to be small and subtle. On a cursory look through the last few releases I was unable to find a single one.

However, the principle stands: if we need to change, e.g., the type of GHC.Desugar.toAnnotationWrapper in a minor GHC (and therefore base) release to fix a soundness issue then we should do so without putting the rest of the ecosystem through the pain of a major bump in base version.

To reiterate, ideally the likes of GHC.Desugar wouldn't live in base at all, but sadly we don't yet live in a world where that is true.

phadej commented 1 year ago

To reiterate, ideally the likes of GHC.Desugar wouldn't live in base at all

Does that module needs to be public? If it's used in desugaring, can't GHC access symbols in non-public modules? At least TH can AFAIK.

EDIT: I understand that you try to be prepared for very unexpected things. But if you find it hard to find a convincing example, maybe it's a sign that asking CLC whether doing slight PVP sin isn't really not that bad?

EDIT2: Or if it's in non-CLC module, than just doing it and warning users in base changelog, and testing the Stackage that it actually doesn't break anyone.

bgamari commented 1 year ago

EDIT2: Or if it's in non-CLC module, than just doing it and warning users in base changelog, and testing the Stackage that it actually doesn't break anyone.

Yes, this is precisely the pragmatic approach that I was suggesting in the original comment.

bgamari commented 1 year ago

s/divide/classify/

Quite right. I've made this change.

phadej commented 1 year ago

EDIT2: Or if it's in non-CLC module, than just doing it and warning users in base changelog, and testing the Stackage that it actually doesn't break anyone.

Yes, this is precisely the pragmatic approach that I was suggesting in the original comment.

But that is different from what the proposal says. Especially as proposal says that GHC.Stats is internal, meaning that it might change in any minor GHC release. I don't agree on that, such soundness issue might need to wait for major GHC release until GHC.Stats is moved to ghc-base.

The bar to make a breaking change (in base, even "internal" modules) in minor GHC version should be set high, and that should be explicitly said in the proposal, not in the discussion.

TeofilC commented 1 year ago

I think one thing that might be good to draw out is how we will communicate the status of base modules going forward.

How would a first time contributor to base find out if the module they want to modify needs a CLC proposal or not?

Some possibilities:

commit a version of Ben's spreadsheet into the CLC repo, link to it from the README and keep it up to date
specify the modules status inside the module
add a checkbox to the GHC MR template that asks if the user has checked if the change needs a CLC proposal

Another question is what the procedure will be for modules moving from one group to another, or for the introduction of new modules. For instance would adding a new internal module need CLC review? What about moving a module from internal to external?

bgamari commented 1 year ago

But that is different from what the proposal says. Especially as proposal says that GHC.Stats is internal, meaning that it might change in any minor GHC release. I don't agree on that, such soundness issue might need to wait for major GHC release until GHC.Stats is moved to ghc-base.

To be clear, this proposal explicitly /does not/ proposal to split out the internal modules of base into ghc-base. While I believe that doing so would be a reasonable step in the future, the goal of this proposal is merely to introduce a distinction between internal and external modules. We are currently proposing that the status of a module be captured in its documentation.

As long as internal modules remain in base we believe that it is reasonable to reserve the right to evolve internal modules outside of the PVP. However, we also agree that the PVP holds value and therefore will naturally take care when exercising this right.

I think one thing that might be good to draw out is how we will communicate the status of base modules going forward.

Yes, clear communication will be essential. I have added a small note suggesting that we introduce a new Haddock field to mark internal modules. Adding a mention to the MR template also seems like a reasonable idea.

Another question is what the procedure will be for modules moving from one group to another, or for the introduction of new modules. For instance would adding a new internal module need CLC review? What about moving a module from internal to external?

This proposal seeks to establish the /concept/ of an internal module and a roadmap for what modules we would l like to move to internal status in the future. While it would be great if the CLC would summarily accept a swath of these changes, we are willing to propose concrete changes in future, smaller CLC proposals if necessary.

hasufell commented 1 year ago

@bgamari are you suggesting this as a stop gap or a permanent solution?

bgamari commented 1 year ago

@bgamari are you suggesting this as a stop gap or a permanent solution?

I am suggesting this as a potentially-permanent solution. That being said, I do believe that #145 would be a considerable improvement and would love to see it adopted in the future..

bgamari commented 1 year ago

Do note that I have dropped the section entitled "where to place internal declarations" as it contained language from the editing process which was not intended to be in this proposal (namely the renaming of existing modules).

NorfairKing commented 1 year ago

I like the idea of code being marked as stable/unstable, however: no docs > wrong docs > docs of indeterminate correctness

So this can only work if the "this module is stable" documentation is kept accurate somehow. I'd be strongly in favour of this proposal if there was a mechanism to do so, and undecided if not.

The reason I'd be undecided without such a mechanism, is that we already have a stability field in haddock and it is widely ignored anyway.

(The other question is for some sort of linting about whether unstable modules are used in a project, but that's not in scope of this proposal afaict.)

hasufell commented 1 year ago

@bgamari are you suggesting this as a stop gap or a permanent solution?

I am suggesting this as a potentially-permanent solution. That being said, I do believe that #145 would be a considerable improvement and would love to see it adopted in the future..

I see.

In that case I'm leaning towards -1 on this proposal, since I believe violating PVP in the standard library sets a bad example (and as indicated in the PVP ticket, I'm also against formalizing it into PVP).

The only pragmatic way forward is the base split to me.

I'm starting to find it hard to follow all the inter-connected tickets about this, though.

Bodigrim commented 1 year ago

For instance, the GHC.Base.mapFB function is a necessary exposed part of the fusion framework for map but which GHC's authors never intended users to call.

@bgamari could you please elaborate on this? There is no necessity to expose functions used by the list fusion framework, and indeed no other function of it (e. g., zipWithFB, mapAccumLF, unwordsFB) is exposed. It seems mapFB is just an oversight and unfit to support your point.

bgamari commented 1 year ago

@bgamari could you please elaborate on this? There is no necessity to expose functions used by the list fusion framework, and indeed no other function of it (e. g., zipWithFB, mapAccumLF, unwordsFB) is exposed. It seems mapFB is just an oversight and unfit to support your point.

Yes, this is a fair point; mapFB and friends indeed need not be exposed as they are only used in rules defined in GHC.Base; this could be resolved with a proper export list. However, even without leaving GHC.Base there are other examples; for instance, maxInt exists to only serve other modules in base.

Looking beyond GHC.Base, there are the many exports of Data.Typeable.Internals (e.g. mkTrType), the array implementation in GHC.Arr, and the exports of GHC.Err.

Bodigrim commented 1 year ago

However, even without leaving GHC.Base there are other examples; for instance, maxInt exists to only serve other modules in base.

(Assuming for a moment that maxInt is potentially unstable) The only place maxInt is used is GHC.Enum, it could have been defined there and not exposed outside. Even if it was used in multiple modules, it could have been defined in an other-module and never exposed to users.

More generally, my question is this. I understand that many fragile entities have been exposed from base. Is there a genuine reason for them to be publicly exposed (and more specifically - exposed from base and not from elsewhere), or is it a historical accident / negligence?

bgamari commented 1 year ago

Edit: this comment misunderstood @Bodigrim's above question

More generally, my question is this. I understand that many fragile entities have been exposed from base. Is there a genuine reason for them to be publicly exposed, or is it a historical accident / negligence?

Yes, there are places where we must expose declarations which otherwise shouldn't be used. Off hand I can think of at least three concrete reasons why this happens:

to avoid import cycles (e.g. GHC.IO.mkUserError)
due to dependencies from generated code (e.g. Data.Typeable.Internals)
for testing purposes

bgamari commented 1 year ago

More generally, my question is this. I understand that many fragile entities have been exposed from base. Is there a genuine reason for them to be publicly exposed (and more specifically - exposed from base and not from elsewhere), or is it a historical accident / negligence?

These cases are indeed largely due to historical accident. In many cases modules have been exposed via exposed-modules which should have rather been hidden in other-modules. These internal modules are often necessary either to serve to break module imports or to satisfy the needs of generated code and were not intended to be used by end-users.

Bodigrim commented 1 year ago

Is it difficult to keep modules which have been exposed due to historical accident stable (or at least avoid breaking changes)? And contain new work to other-modules?

bgamari commented 1 year ago

@Bodigrim it very much depends upon the case; this is essentially what I try to capture in the "stability risk" column of the spreadsheet. Anything assessed to be 0 or 1 will likely be quite easy to keep stable (and consequently I generally propose that these be "stabilized" except in cases where there is no evidence of external usage).

Modules assessed to be a higher stability risk would be harder. In some cases we propose that these be stabilized despite this (in particular, in cases where we find high degrees of dependence in the ecosystem). However, there are certainly a number of modules which we would prefer to hide.

Regardless, yes, we will need to be more careful to contain new work to other-modules in the future.

Bodigrim commented 1 year ago

What does Stability risk grade 3 stand for?

bgamari commented 1 year ago

These assessments are a qualitative, fairly subjective grading. Grading 3 essentially corresponds to things that not only are likely to change but that we would also active discourage users from relying on.

chshersh commented 1 year ago

I like the proposal overall. Here's my constructive feedback.

Incidentally, the Stability Haddock field of a module is not the same as Internal vs External distinction. A module could be External (i.e. designed for external callers), and yet experimental and not yet stable. That seems to be the intended purpose of the Stability field, although it is not well describe anywhere (please tell us there is a good specification).

We propose to document internal modules via a yet-to-be-named Haddock field.

I don't think we need a new field. I suggest reusing Stability and introduce new values if necessary. This is only a matter of documentation. Currently, the relevant section in Haddock docs doesn't specify the meaning of stable, experimental, provisional and internal.

I suggest opening a PR to Haddock with the description of these four fields as the immediate next step.

The proposed actions fall into a few broad buckets:

Internalize, which denotes the GHC developers' intent to in the future open a CLC proposal to move the module from External to Internal.

Hide, which denotes the GHC developers' intent to in the future open a CLC proposal to remove the module from External to Hidden.

Currently, the Base stability spreadsheet wants to either hide or internalize a total of 38 modules. Creating 38 CLC proposals (one per each module) sounds like huge amount of work.

I suggest to at least split this process into batches. In fact, I believe we can agree on most of the modules in this CLC proposa directly.

The question of GHC.Exts

GHC.Exts sounds like PITA. My recommendation would be to:

Stop reexporting new stuff from GHC.Exts immediately today
Deprecate reexports of unstable functions via a separate CLC proposal (was there GHC work in progress on the deprecating reexports feature?)
Incrementally refine this module when the need arises

To make the discussion concrete, we have characterized each of the exposed modules in the GHC.* namespace along three axes:

My general view:

If a spreadsheet indicates 0 usages for a module, I'm okay with hiding this module immediately (as the result of this proposal)

I would like to provide my comments to all other individual modules:

GHC.Arr -> hide: The array package depends on base and imports GHC.Arr so hiding GHC.Arr won't work
GHC.Base -> hide: The spreadsheet indicates 353 usages, so hiding this module is too big of breakage to the point it's not feasible
GHC.Exception.Type -> hide: Has 2 usages, so let's fix them before hiding
GHC.Fingerprint.Type -> hide: 18 usages is a bit much to hide this module
GHC.InfoProv -> stabilize: This module appeared in base-4.18, and just recently there was a proposal to change one of the types. I think it's too early to stabilize this module.
GHC.Pack -> hide: It has 10 usages but it's already deprecated. I suggest to wait a while and allow users to migrate away before hiding.
GHC.Stack.CloneStack -> stabilize: 0 usages. This really sounds like a GHC internal thing so I'd suggest to keep it internal.
GHC.Show -> internalize, export Show from new Data.Show module: Sounds like a good plan, and will require a separate CLC proposal
GHC.TopHandler -> hide: 6 usages is not a lot but I would suggest to help users migrate away before hiding

bgamari commented 1 year ago

Does anyone else have any thoughts on this? It would be great to be able to move this proposal along.

Bodigrim commented 1 year ago

These assessments are a qualitative, fairly subjective grading. Grading 3 essentially corresponds to things that not only are likely to change but that we would also active discourage users from relying on.

Let's take a look at System.Posix.Internals, for example. This is indeed quite a kitchen sink and users would be better off using unix package. But I do not see a reason for it to be fundamentally unstable: it's not like we really expect to change the type of c_open :: CFilePath -> CInt -> CMode -> IO CInt any time soon.

Why not put actual implementation into, say, GHC.Internals.Posix declared as an other-module and keep re-exporting the frozen set of entities from System.Posix.Internals? And probably slap {-# DEPRECATED #-} over the latter? This way GHC developers gain ability to add new entities to GHC.Internals.Posix as they see fit, and PVP is not violated = clients are not disrupted.

My point is that if the majority of exposed internals are historical accidents then we should not perpetuate and deepen these mistakes by excluding them from PVP. Keep them stable, mark as deprecated and put new stuff into other-modules. This process can be done module by module, without any big jumps.

Bodigrim commented 1 year ago

Based on provided examples, I'm not convinced that it's a good idea to exclude breaking API changes (even only limited to certain modules) from CLC purview. It seems more beneficial to move implementations under other-modules and keep already exposed interfaces stable.

I am however quite sympathetic to exclude non-breaking, additive API changes in certain namespaces from CLC process. I'm fine to designate GHC.Internals for this. Being mindful that breaking whatever was already released as exposed would require a proposal should help GHC developers to find a right balance in API design on their own, without CLC input.

hasufell commented 1 year ago

I'm getting confused by all the proposed variations. So I'll reiterate my stance:

base must be PVP compliant (all of it)
every module that is exposed in base is under CLC purview

Any proposal that violates those points will get a -1 from me. I'm open to alternatives to base split though.

AndreasPK commented 1 year ago

Based on provided examples, I'm not convinced that it's a good idea to exclude breaking API changes (even only limited to certain modules) from CLC purview. It seems more beneficial to move implementations under other-modules and keep already exposed interfaces stable.

In my experience hiding exports makes certain common tasks like minimizing reproducers for ghc bugs more painful for little benefit so I would not welcome such an approach.

Edit: Fixed type from expert to export.

nomeata commented 1 year ago

I agree that we should expose all the experts! If you are an expert, no need to hide, we are all mostly friendly! (SCNR)

simonpj commented 1 year ago

Thanks for the feedback on this proposal. Ben said above:

To be clear, this proposal explicitly /does not/ proposal to split out the internal modules of base into ghc-base. While I believe that doing so would be a reasonable step in the future, the goal of this proposal is merely to introduce a distinction between internal and external modules. We are currently proposing that the status of a module be captured in its documentation.

Our intention was to make progress on the idea of identifying which bits of base are internal, without having to wait for a machine-supported mechanism to be agreed. But it's clearly a bit unsatisfactory to have only an informally-defined split, with some special language around the PVP. @hausfell's puts it well. He wants the clarity of:

base must be PVP compliant (all of it)
every module that is exposed in base is under CLC purview

The leading contender for a proper mechanism is, I believe, some version of the "split base" idea. Here is a minimal version:

We make a new package ghc-base, on which base depends.
ghc-base is not under the CLC purview.
Initally, all the code in the current base stays there, and ghc-base is empty.
The API of base does not change at all.
GHC devs are free to add new code and exports into ghc-base. It is the place to put stuff that is
- internal to GHC's implementation of the base API
- that we still want to expose to savvy users
- but which we do not want to expose in the the highly-stable and much-depended-on API of base.
GHC devs are also free to move code from base into ghc-base, provided the API of base does not change. They might need to do so becuase a desired function to add to ghc-base depends on something in base. This might mean that modules in base become shims that simply re-export stuff from ghc-base.
Over time, anyone can propose to remove modules from the base API. A list of candidates for such removals are the "internal" modules of this proposal. But there is no big bang. We can do this over time, in batches, whenever.

Questions:

Is this plan technically feasible?
Would it enjoy the support of CLC?

Others will know better than I what the technical problems might be. Someone told me that Haddock doesn't work very well with shim modules... is that right? Is it fixable?

There is plenty of prior art here:

Everyone is trying to do the right thing here. I'm keen for us to find a way forward that might not do everything, but does enough to allow us to make progress.

mixphix commented 1 year ago

Thanks for this great summary. This use of ghc-base will allow GHC developers the freedom to hack and experiment as they see fit without as great a risk of breaking the rest of the world.

When moving modules from base into ghc-base, we will have to be certain not to touch the export lists, or to make explicit (in all the exported names, not just the module itself) those that do not have them: this should prevent Haddock from rendering a blank module.

hasufell commented 1 year ago

I think what @simonpj describes sounds somewhat similar to what @Bodigrim wrote here: https://github.com/haskell/core-libraries-committee/issues/145#issuecomment-1484182300

However, still unanswered is the question who will govern ghc-base. Because as pointed out: if base re-exports from ghc-base, CLC must govern both (or any function/type that is a transitive dependency of any base exported function... who will keep track of this subset of ghc-base?). Otherwise CLC can't fulfill its duty.

To make it more clear. Assuming we have a consensus on what "GHC internals" means and "base" is everything that's not GHC internal, then we have:

base API that doesn't depend on GHC internals
base API that does depend on GHC internals
GHC internals that depend on base API
GHC internals that don't depend on base API
GHC internals that don't depend on base API and are not dependet on by base

The problem is 2. and 3, because both would have to be moved into ghc-base eventually, if we want all GHC internals to live there (does anyone know how much of each of those entities we have?). Even if we go with the gradual approach, the only thing that's feasible without major effort and communication is migrating 5.

However, even if we stagnate with 5., it might still be an improvement and allow to deprecate some of the GHC internals in base.

chshersh commented 1 year ago

I propose to start with ghc-base that depends on base. We already have "ghc-base used by base" and it's called ghc-prim.

mixphix commented 1 year ago

That's a good point, we already do have ghc-prim for this purpose. I'd hate for the GHC codebase to become even more convoluted and unmanageable than it already is (or appears to be). Limiting the interwovenness of internal packages and organizing core code in a sensible way will also improve stability, ease of onboarding, and community participation.

simonpj commented 1 year ago

However, still unanswered is the question who will govern ghc-base. Because as pointed out: if base re-exports from ghc-base, CLC must govern both (or any function/type that is a transitive dependency of any base exported function... who will keep track of this subset of ghc-base?). Otherwise CLC can't fulfill its duty.

I think the answer is simple:

GHC devs control ghc-base
But if any change there is visible in the API of base, we must consult CLC

Simple! CLC controls the API of base, and changes there need their approval.

I think ghc-prim is a bit of a red herring here. It exists as a separate package so that we can build the integer package, on which base depends. It should not matter to clients how GHC structures its internals -- it's an implementation matter. Of course, we have a strong incentive to make it as simple as possible!

To put it another way, GHC could implement ghc-base today, and provided we don't change the API of base, no one outside GHC-dev-land should know or care. I'm just floating the idea here because I think we'll all work together better if we commmunicate well. And I'm a little hazy about whether our tech (esp Haddock) is up ot making it invisible whether a function is implemented in base itself or is a re-export of something from another package. (Of course this already arises for types and functions defined in ghc-prim.)

hasufell commented 1 year ago

But if any change there is visible in the API of base, we must consult CLC

This isn't always that straight forward in light of transitive function dependencies (e.g. function A is re-exported from base, but also uses internal function B, which is not re-exported... GHC devs change function B and accidentally break performance properties).

Remember, CLC does care about laziness/performance properties.

Who is going to asses whether there is a visible impact on base API?

bgamari commented 1 year ago

@hasufell, I think that the transitive dependency issue is surmountable. GHC developers, aided by CI, would assess impact. We already have work-in-progress to verify that base's exports and their types are not inadvertently changed. In principle we could (and perhaps should) add tests to verify the laziness properties of its exports. Performance is clearly harder, but I would argue that the difficulty imposed by splitting base and ghc-base is negligible.

simonpj commented 1 year ago

Who is going to assess whether there is a visible impact on base API?

Good question. @bgamari has a patch that mechanically checks the API of base, and complains if there are any changes. This is way more than we have right now!

This isn't always that straight forward in light of transitive function dependencies (e.g. function A is re-exported from base, but also uses internal function B, which is not re-exported... GHC devs change function B and accidentally break performance properties).

That is true, but is already a challenge:

base depends on ghc-bignum and ghc-prim, so changes in either of those packages could change performance.
I don't think CLC currently reviews every patch to base, only those that the author thinks might affect performance. It'd be the same here. Changes to ghc-base that affect performance of base functions shoudl be discussed with CLC.
Every patch to GHC itself can affect performance characteristics of compiled code, occasionally substantially.

One Good Thing would be to expand the performance tests in the base test suite, and that is something the CLC might consider. But I see no systematic way to guarantee that a changes to GHC or its ecosystem of libaries won't have an effect on the perf of one or more base exports. We just all have to do the best we can -- and that "best" is unaffected by whether the code happens to live in base or in ghc-base.

Another bit of technology is on its way: deprecated exports. GHC proposal #134, GHC ticket #4879, and an up-coming merge request. This will make it easier when we went to remove internal modules from base and export them from ghc-base only.

Apparently the Haddock issue really isn't an issue. For example, here is GHC.Exts which is mostly just a shim, and its Haddock page looks just fine.

Bodigrim commented 1 year ago

@simonpj I do appreciate your input, but I don't enjoy derailing the conversation. Please share you opinion on "split base" in the relevant issue. As you aptly quoted, this proposal is deliberately not about it.

There seems to be an underlying notion that asking for CLC approval is a burden and a bottleneck. This is a false impression: there is a long queue of already approved proposals which GHC developers tarry to finish (MRs !8912, !10176, !10171, !10132) and another queue of proposals, where CLC is ready to vote, but MRs from GHC developers are still pending (#126, #133, #134). This makes me wondering even more about benefits of excluding things from CLC purview.

simonpj commented 1 year ago

Please share you opinion on "split base" in the relevant issue

You mean here? Yes, good idea. I will do that. My reason for floating it here is that, given the lack of enthusiam for our proposal above I wanted to gauge the CLC enthusiasm for:

Withdraw our proposal above
Support the split-base proposal
Then re-make our proposal in the form of proposals to remove GHC-internal modules from the base API and put them in ghc-base

Question: would that plan meet with CLC approval? I am keen to pursue a path that enjoys CLC support.

There seems to be an underlying notion that asking for CLC approval is a burden and a bottleneck

Not at all. It's rather that I want a clear way to signal to users that a particular data type or function is internal, and should only be used if you want to be exposed to GHC internals. We have no way to signal that at the moment, and that inevitably leads users to accidentally depend on things they shouldn't, through no fault of their own; and then pressure to keep these accidental and unintended interfaces stable into future. By providing a structural way to express intent, we will avoid future pain.

mixphix commented 1 year ago

Forgive me if this is a silly question, but do the exposed-modules/other-modules distinction in the .cabal file not provide enough distinction? If these are indeed cabal packages, there is support for "internal libraries" as well. This would allow an equally-precise segregation of "within CLC scope" modules and "outside CLC scope" modules, without stumbling into the larger issue of actually cleaving base in two.

simonpj commented 1 year ago

do the exposed-modules/other-modules distinction in the .cabal file not provide enough distinction?

Alas, no. other-modules cannot be imported, ever. But high-performance users specifically want to reach into the internals of GHC (e.g. the representation of arbitrary precision integers) even at the cost of being exposed to churn in those internals. Performance testing and debugging are other reasons that we want it to be possible to import internal modules. If cabal provided a 3-way distinction (hidden, internal, external) with the latter two being exposed, that would indeed address the problem.

But changing cabal would take time, and some people say "no need to change cabal, just split your package into two", so it's not a slam-dunk change.

Kleidukos commented 1 year ago

(With the new feature of multiple public library components in cabal, this split could be made easier to manage)

hasufell commented 1 year ago

Let's go back to this very proposal:

declaring GHC modules as internal makes sense to me... but this doesn't really need CLC approval anyway
violating PVP is not cool, so I'm against this part
excluding modules from CLC purview (given that they adhere to PVP)... I could agree to, if we make it very clear how API impact is going to be evaluated and communicated

If we agree to internal modules, excluded from CLC purview, while maintaining PVP adherence... then that doesn't seem too bad. Although one could argue it might cause churn, because of more aggressive bumping of the major base version... I don't think it matters much, since GHC ships with base anyway (although theoretically, a GHC patch release could introduce a major base bump, which can be problematic... we might just have to avoid it).

In case we want base to be reinstallable and decouple it from GHC, then we can execute the base split. Because then, base PVP is much more useful/important than it is now.

So that could be a possible stopgap.

Bodigrim commented 1 year ago

My analysis is similar to @hasufell here.

I'm uneasy with explicitly granting PVP exclusions. Given that the proposer (or anyone else) cannot recall a use case, I'd rather omit this for now. FWIW CLC does not normally opine on versioning of base, GHC developers are bumping versions on their own.
I'm fine with documenting some (potentially, a wide range of) modules as internal. In my view this is a necessary prerequisite for a potential future package split into ghc-base and base, if any.
I'm generally agreeable with allowing "additive" changes to "internal" modules, potentially even without CLC approval. There is a certain subtlety about who is to decide that the change is purely additive. There is a usual complication with type classes and instances: adding a new type class or instance to base could have far-fetching consequences, so it might be worth a consultation.

There seems to be an underlying notion that asking for CLC approval is a burden and a bottleneck

Not at all.

Interesting. I'd say that excluding modules from CLC purview is the most contentious part of the proposal. If it is not a key point, I'd suggest (in my private capacity) the following:

We document and designate a range of modules and namespaces in base (roughly in line with Ben's analysis) as "internal".
We agree that the bar for changes to "internal" modules is generally lower than to "non-internal" ones. They nevertheless remain subject to PVP and require a CLC proposal.
We agree that additive changes to "internal" modules are generally acceptable and can be approved following a lightweight procedure. We also agree that uncontrolled expansion of base, even in "internal" parts, is not in our best interest.
We do not sneak stuff, which is generally useful to wide circles of users, into "internal" modules. E. g., it's unfair to put class Profunctor into, say, GHC.Internal.Profunctor module, because everyone will ignore warnings about instability and just go for it.

@simonpj @bgamari does it look reasonable to you?

cdornan commented 1 year ago

I am a little confused here. Ben is proposing to provide some structure where there is currently none (in effect) so that the GHC team and the CLC can do a better job of managing change, and so that everyone else can better understand what is stable and can be relied upon, and what is internal and more liable to change. This is a good, yes?

Does it make sense to object on the grounds that any change anywhere might be visible and therefore every change whatsoever going forward to anything in GHC must be subjected to a CLC approval process? This seems to be the logic of the objections, applied consistently.

Can we not make things a little better without trying to make them perfect. For somebody outside of GHC and the CLC Simon's proposal appears perfectly resonable.

[I wrote this comment without realising I was looking at the thread without @Bodigrim 's latest comment which it does not reflect — sorry.]

cdornan commented 1 year ago

@bodigrim's proposal seems reasonable to me but the devil could be in the detail. I will be interested to hear what Ben and Simon think.

haskell / core-libraries-committee