Closed bgamari closed 1 year ago
We propose to divide the modules of base into three groups
s/divide/classify/
, perhaps? "divide" makes it sound like you're intending to carry out some operation on the source repo, but I think you mean just "publish a classifcation". In the subsequent section you use "declare", and explain that the "division" is done out-of-band. I think it might be wise to foreshadow this by changing "divide" to "classify", to avoid misdirecting the reader.
There are two axis of distinction between external and internal proposed:
I stricly disagree with the second point. If change is breaking, than it requires major version bump. I don't care whether the module is marked as "implementation specific internal". If you expose it, than it's part of public API.
If you want to fix that issue, introduce a new package (ghc-base
) and version it properly. Then if base
doesn't re-export affected stuff from ghc-base
, it can stay at the same major version.
And as far as I understand, it's the first point which causes most friction. GHC devs need to change something in GHC specific stuff, but have to go through CLC. And that I agree on, the clarification will make everyone's life eaiser.
But I repeat. Do not bundle these two unrelated considerations, they are not related. As an (occasional) user of low-level base
stuff, I do care about them being versioned as PVP specifies. Otherwise I'd be forced to use tighter upper bounds on base
. If you want to version GHC specifics separately, put them into separate package. Until then, version them properly.
Finally, if it's really internal modules (i.e. possible future ghc-base
) maintenance approach is
The GHC team makes no effort to maintain the stability of this API
then whole thing is not worth doing. You probably want to word that differently.
If change is breaking, than it requires major version bump.
As a matter of principle, I have no objection to continuing to bump the version of base
when internal declarations change in a breaking way. This is one reason why I support #145.
However, until this happens (which could happen in time for GHC 9.8, if we move quickly) we will need to be pragmatic (as we have been in the past). Concretely, this means accepting the (small) possibility that breaking changes in internal modules may occur in minor version bumps. Of course, we will do due diligence to minimize the damage, but if there is a change needed in an internal declaration for a soundness fix (or something of similar severity) then we will make the change with a minor bump. Doing otherwise would merely cause undue harm to users for no perceivable benefit.
@bgamari
(as we have been in the past).
Can you provide a short summary of such breaking changes deep in base
which should had technically been major bumps but hadn't? I don't remember any in recent years.
EDIT: I know that ghc
library doesn't try very hard to be PVP compliant, as a lot of stuff is exported, and there are many fixes and changes in ghc
itself between minor versions. But that's another story: it's not base
.
Can you provide a short summary of such breaking changes deep in base which should had technically been major bumps but hadn't? I don't remember any in recent years.
Such cases are (and should be) rare. Producing a comprehensive summary of these changes would be require a fair amount of effort not because there are numerous cases but because they tend to be small and subtle. On a cursory look through the last few releases I was unable to find a single one.
However, the principle stands: if we need to change, e.g., the type of GHC.Desugar.toAnnotationWrapper
in a minor GHC (and therefore base
) release to fix a soundness issue then we should do so without putting the rest of the ecosystem through the pain of a major bump in base
version.
To reiterate, ideally the likes of GHC.Desugar
wouldn't live in base
at all, but sadly we don't yet live in a world where that is true.
To reiterate, ideally the likes of GHC.Desugar wouldn't live in base at all
Does that module needs to be public? If it's used in desugaring, can't GHC access symbols in non-public modules? At least TH can AFAIK.
EDIT: I understand that you try to be prepared for very unexpected things. But if you find it hard to find a convincing example, maybe it's a sign that asking CLC whether doing slight PVP sin isn't really not that bad?
EDIT2: Or if it's in non-CLC module, than just doing it and warning users in base
changelog, and testing the Stackage that it actually doesn't break anyone.
EDIT2: Or if it's in non-CLC module, than just doing it and warning users in base changelog, and testing the Stackage that it actually doesn't break anyone.
Yes, this is precisely the pragmatic approach that I was suggesting in the original comment.
s/divide/classify/
Quite right. I've made this change.
EDIT2: Or if it's in non-CLC module, than just doing it and warning users in base changelog, and testing the Stackage that it actually doesn't break anyone.
Yes, this is precisely the pragmatic approach that I was suggesting in the original comment.
But that is different from what the proposal says. Especially as proposal says that GHC.Stats
is internal, meaning that it might change in any minor GHC release. I don't agree on that, such soundness issue might need to wait for major GHC release until GHC.Stats
is moved to ghc-base
.
The bar to make a breaking change (in base
, even "internal" modules) in minor GHC version should be set high, and that should be explicitly said in the proposal, not in the discussion.
I think one thing that might be good to draw out is how we will communicate the status of base modules going forward.
How would a first time contributor to base
find out if the module they want to modify needs a CLC proposal or not?
Some possibilities:
Another question is what the procedure will be for modules moving from one group to another, or for the introduction of new modules. For instance would adding a new internal module need CLC review? What about moving a module from internal to external?
But that is different from what the proposal says. Especially as proposal says that
GHC.Stats
is internal, meaning that it might change in any minor GHC release. I don't agree on that, such soundness issue might need to wait for major GHC release untilGHC.Stats
is moved toghc-base
.
To be clear, this proposal explicitly /does not/ proposal to split out the internal modules of base
into ghc-base
. While I believe that doing so would be a reasonable step in the future, the goal of this proposal is merely to introduce a distinction between internal and external modules. We are currently proposing that the status of a module be captured in its documentation.
As long as internal modules remain in base
we believe that it is reasonable to reserve the right to evolve internal modules outside of the PVP. However, we also agree that the PVP holds value and therefore will naturally take care when exercising this right.
I think one thing that might be good to draw out is how we will communicate the status of
base
modules going forward.
Yes, clear communication will be essential. I have added a small note suggesting that we introduce a new Haddock field to mark internal modules. Adding a mention to the MR template also seems like a reasonable idea.
Another question is what the procedure will be for modules moving from one group to another, or for the introduction of new modules. For instance would adding a new internal module need CLC review? What about moving a module from internal to external?
This proposal seeks to establish the /concept/ of an internal module and a roadmap for what modules we would l like to move to internal status in the future. While it would be great if the CLC would summarily accept a swath of these changes, we are willing to propose concrete changes in future, smaller CLC proposals if necessary.
@bgamari are you suggesting this as a stop gap or a permanent solution?
@bgamari are you suggesting this as a stop gap or a permanent solution?
I am suggesting this as a potentially-permanent solution. That being said, I do believe that #145 would be a considerable improvement and would love to see it adopted in the future..
Do note that I have dropped the section entitled "where to place internal declarations" as it contained language from the editing process which was not intended to be in this proposal (namely the renaming of existing modules).
I like the idea of code being marked as stable/unstable, however: no docs > wrong docs > docs of indeterminate correctness
So this can only work if the "this module is stable" documentation is kept accurate somehow. I'd be strongly in favour of this proposal if there was a mechanism to do so, and undecided if not.
The reason I'd be undecided without such a mechanism, is that we already have a stability field in haddock and it is widely ignored anyway.
(The other question is for some sort of linting about whether unstable modules are used in a project, but that's not in scope of this proposal afaict.)
@bgamari are you suggesting this as a stop gap or a permanent solution?
I am suggesting this as a potentially-permanent solution. That being said, I do believe that #145 would be a considerable improvement and would love to see it adopted in the future..
I see.
In that case I'm leaning towards -1 on this proposal, since I believe violating PVP in the standard library sets a bad example (and as indicated in the PVP ticket, I'm also against formalizing it into PVP).
The only pragmatic way forward is the base split to me.
I'm starting to find it hard to follow all the inter-connected tickets about this, though.
For instance, the
GHC.Base.mapFB
function is a necessary exposed part of the fusion framework formap
but which GHC's authors never intended users to call.
@bgamari could you please elaborate on this? There is no necessity to expose functions used by the list fusion framework, and indeed no other function of it (e. g., zipWithFB
, mapAccumLF
, unwordsFB
) is exposed. It seems mapFB
is just an oversight and unfit to support your point.
@bgamari could you please elaborate on this? There is no necessity to expose functions used by the list fusion framework, and indeed no other function of it (e. g.,
zipWithFB
,mapAccumLF
,unwordsFB
) is exposed. It seemsmapFB
is just an oversight and unfit to support your point.
Yes, this is a fair point; mapFB
and friends indeed need not be exposed as they are only used in rules defined in GHC.Base
; this could be resolved with a proper export list. However, even without leaving GHC.Base
there are other examples; for instance, maxInt
exists to only serve other modules in base
.
Looking beyond GHC.Base
, there are the many exports of Data.Typeable.Internals
(e.g. mkTrType
), the array implementation in GHC.Arr
, and the exports of GHC.Err
.
However, even without leaving
GHC.Base
there are other examples; for instance,maxInt
exists to only serve other modules inbase
.
(Assuming for a moment that maxInt
is potentially unstable)
The only place maxInt
is used is GHC.Enum
, it could have been defined there and not exposed outside. Even if it was used in multiple modules, it could have been defined in an other-module
and never exposed to users.
More generally, my question is this. I understand that many fragile entities have been exposed from base
. Is there a genuine reason for them to be publicly exposed (and more specifically - exposed from base
and not from elsewhere), or is it a historical accident / negligence?
Edit: this comment misunderstood @Bodigrim's above question
More generally, my question is this. I understand that many fragile entities have been exposed from base. Is there a genuine reason for them to be publicly exposed, or is it a historical accident / negligence?
Yes, there are places where we must expose declarations which otherwise shouldn't be used. Off hand I can think of at least three concrete reasons why this happens:
GHC.IO.mkUserError
)Data.Typeable.Internals
)More generally, my question is this. I understand that many fragile entities have been exposed from
base
. Is there a genuine reason for them to be publicly exposed (and more specifically - exposed frombase
and not from elsewhere), or is it a historical accident / negligence?
These cases are indeed largely due to historical accident. In many cases modules have been exposed via exposed-modules
which should have rather been hidden in other-modules
. These internal modules are often necessary either to serve to break module imports or to satisfy the needs of generated code and were not intended to be used by end-users.
Is it difficult to keep modules which have been exposed due to historical accident stable (or at least avoid breaking changes)? And contain new work to other-modules
?
@Bodigrim it very much depends upon the case; this is essentially what I try to capture in the "stability risk" column of the spreadsheet. Anything assessed to be 0 or 1 will likely be quite easy to keep stable (and consequently I generally propose that these be "stabilized" except in cases where there is no evidence of external usage).
Modules assessed to be a higher stability risk would be harder. In some cases we propose that these be stabilized despite this (in particular, in cases where we find high degrees of dependence in the ecosystem). However, there are certainly a number of modules which we would prefer to hide.
Regardless, yes, we will need to be more careful to contain new work to other-modules
in the future.
What does Stability risk grade 3 stand for?
These assessments are a qualitative, fairly subjective grading. Grading 3 essentially corresponds to things that not only are likely to change but that we would also active discourage users from relying on.
I like the proposal overall. Here's my constructive feedback.
Incidentally, the Stability Haddock field of a module is not the same as Internal vs External distinction. A module could be External (i.e. designed for external callers), and yet experimental and not yet stable. That seems to be the intended purpose of the Stability field, although it is not well describe anywhere (please tell us there is a good specification).
We propose to document internal modules via a yet-to-be-named Haddock field.
I don't think we need a new field. I suggest reusing Stability
and introduce new values if necessary. This is only a matter of documentation. Currently, the relevant section in Haddock docs doesn't specify the meaning of stable
, experimental
, provisional
and internal
.
I suggest opening a PR to Haddock with the description of these four fields as the immediate next step.
The proposed actions fall into a few broad buckets:
- Internalize, which denotes the GHC developers' intent to in the future open a CLC proposal to move the module from External to Internal.
- Hide, which denotes the GHC developers' intent to in the future open a CLC proposal to remove the module from External to Hidden.
Currently, the Base stability spreadsheet wants to either hide or internalize a total of 38 modules. Creating 38 CLC proposals (one per each module) sounds like huge amount of work.
I suggest to at least split this process into batches. In fact, I believe we can agree on most of the modules in this CLC proposa directly.
The question of GHC.Exts
GHC.Exts
sounds like PITA. My recommendation would be to:
GHC.Exts
immediately todayTo make the discussion concrete, we have characterized each of the exposed modules in the GHC.* namespace along three axes:
My general view:
I would like to provide my comments to all other individual modules:
GHC.Arr -> hide
: The array
package depends on base
and imports GHC.Arr
so hiding GHC.Arr
won't workGHC.Base -> hide
: The spreadsheet indicates 353 usages, so hiding this module is too big of breakage to the point it's not feasibleGHC.Exception.Type -> hide
: Has 2 usages, so let's fix them before hidingGHC.Fingerprint.Type -> hide
: 18 usages is a bit much to hide this moduleGHC.InfoProv -> stabilize
: This module appeared in base-4.18
, and just recently there was a proposal to change one of the types. I think it's too early to stabilize this module.GHC.Pack -> hide
: It has 10 usages but it's already deprecated. I suggest to wait a while and allow users to migrate away before hiding.GHC.Stack.CloneStack -> stabilize
: 0 usages. This really sounds like a GHC internal thing so I'd suggest to keep it internal.GHC.Show -> internalize, export Show from new Data.Show module
: Sounds like a good plan, and will require a separate CLC proposalGHC.TopHandler -> hide
: 6 usages is not a lot but I would suggest to help users migrate away before hidingDoes anyone else have any thoughts on this? It would be great to be able to move this proposal along.
These assessments are a qualitative, fairly subjective grading. Grading 3 essentially corresponds to things that not only are likely to change but that we would also active discourage users from relying on.
Let's take a look at System.Posix.Internals
, for example. This is indeed quite a kitchen sink and users would be better off using unix
package. But I do not see a reason for it to be fundamentally unstable: it's not like we really expect to change the type of c_open :: CFilePath -> CInt -> CMode -> IO CInt
any time soon.
Why not put actual implementation into, say, GHC.Internals.Posix
declared as an other-module
and keep re-exporting the frozen set of entities from System.Posix.Internals
? And probably slap {-# DEPRECATED #-}
over the latter? This way GHC developers gain ability to add new entities to GHC.Internals.Posix
as they see fit, and PVP is not violated = clients are not disrupted.
My point is that if the majority of exposed internals are historical accidents then we should not perpetuate and deepen these mistakes by excluding them from PVP. Keep them stable, mark as deprecated and put new stuff into other-modules
. This process can be done module by module, without any big jumps.
Based on provided examples, I'm not convinced that it's a good idea to exclude breaking API changes (even only limited to certain modules) from CLC purview. It seems more beneficial to move implementations under other-modules
and keep already exposed interfaces stable.
I am however quite sympathetic to exclude non-breaking, additive API changes in certain namespaces from CLC process. I'm fine to designate GHC.Internals
for this. Being mindful that breaking whatever was already released as exposed would require a proposal should help GHC developers to find a right balance in API design on their own, without CLC input.
I'm getting confused by all the proposed variations. So I'll reiterate my stance:
Any proposal that violates those points will get a -1 from me. I'm open to alternatives to base split though.
Based on provided examples, I'm not convinced that it's a good idea to exclude breaking API changes (even only limited to certain modules) from CLC purview. It seems more beneficial to move implementations under other-modules and keep already exposed interfaces stable.
In my experience hiding exports makes certain common tasks like minimizing reproducers for ghc bugs more painful for little benefit so I would not welcome such an approach.
Edit: Fixed type from expert to export.
I agree that we should expose all the experts! If you are an expert, no need to hide, we are all mostly friendly! (SCNR)
Thanks for the feedback on this proposal. Ben said above:
To be clear, this proposal explicitly /does not/ proposal to split out the internal modules of base into ghc-base. While I believe that doing so would be a reasonable step in the future, the goal of this proposal is merely to introduce a distinction between internal and external modules. We are currently proposing that the status of a module be captured in its documentation.
Our intention was to make progress on the idea of identifying which bits of base
are internal, without having to wait for a machine-supported mechanism to be agreed. But it's clearly a bit unsatisfactory to have only an informally-defined split, with some special language around the PVP. @hausfell's puts it well. He wants the clarity of:
The leading contender for a proper mechanism is, I believe, some version of the "split base" idea. Here is a minimal version:
We make a new package ghc-base
, on which base
depends.
ghc-base
is not under the CLC purview.
Initally, all the code in the current base
stays there, and ghc-base
is empty.
The API of base
does not change at all.
GHC devs are free to add new code and exports into ghc-base
. It is the place to put stuff that is
base
APIbase
.GHC devs are also free to move code from base
into ghc-base
, provided the API of base
does not change. They might need to do so becuase a desired function to add to ghc-base
depends on something in base
. This might mean that modules in base
become shims that simply re-export stuff from ghc-base
.
Over time, anyone can propose to remove modules from the base
API. A list of candidates for such removals are the "internal" modules of this proposal. But there is no big bang. We can do this over time, in batches, whenever.
Questions:
Others will know better than I what the technical problems might be. Someone told me that Haddock doesn't work very well with shim modules... is that right? Is it fixable?
There is plenty of prior art here:
Everyone is trying to do the right thing here. I'm keen for us to find a way forward that might not do everything, but does enough to allow us to make progress.
Thanks for this great summary. This use of ghc-base
will allow GHC developers the freedom to hack and experiment as they see fit without as great a risk of breaking the rest of the world.
When moving modules from base
into ghc-base
, we will have to be certain not to touch the export lists, or to make explicit (in all the exported names, not just the module itself) those that do not have them: this should prevent Haddock from rendering a blank module.
I think what @simonpj describes sounds somewhat similar to what @Bodigrim wrote here: https://github.com/haskell/core-libraries-committee/issues/145#issuecomment-1484182300
However, still unanswered is the question who will govern ghc-base
. Because as pointed out: if base re-exports from ghc-base, CLC must govern both (or any function/type that is a transitive dependency of any base exported function... who will keep track of this subset of ghc-base?). Otherwise CLC can't fulfill its duty.
To make it more clear. Assuming we have a consensus on what "GHC internals" means and "base" is everything that's not GHC internal, then we have:
The problem is 2. and 3, because both would have to be moved into ghc-base
eventually, if we want all GHC internals to live there (does anyone know how much of each of those entities we have?). Even if we go with the gradual approach, the only thing that's feasible without major effort and communication is migrating 5.
However, even if we stagnate with 5., it might still be an improvement and allow to deprecate some of the GHC internals in base.
I propose to start with ghc-base
that depends on base
. We already have "ghc-base
used by base
" and it's called ghc-prim
.
That's a good point, we already do have ghc-prim
for this purpose. I'd hate for the GHC codebase to become even more convoluted and unmanageable than it already is (or appears to be). Limiting the interwovenness of internal packages and organizing core code in a sensible way will also improve stability, ease of onboarding, and community participation.
However, still unanswered is the question who will govern ghc-base. Because as pointed out: if base re-exports from ghc-base, CLC must govern both (or any function/type that is a transitive dependency of any base exported function... who will keep track of this subset of ghc-base?). Otherwise CLC can't fulfill its duty.
I think the answer is simple:
ghc-base
base
, we must consult CLCSimple! CLC controls the API of base
, and changes there need their approval.
I think ghc-prim
is a bit of a red herring here. It exists as a separate package so that we can build the integer package, on which base
depends. It should not matter to clients how GHC structures its internals -- it's an implementation matter. Of course, we have a strong incentive to make it as simple as possible!
To put it another way, GHC could implement ghc-base
today, and provided we don't change the API of base
, no one outside GHC-dev-land should know or care. I'm just floating the idea here because I think we'll all work together better if we commmunicate well. And I'm a little hazy about whether our tech (esp Haddock) is up ot making it invisible whether a function is implemented in base
itself or is a re-export of something from another package. (Of course this already arises for types and functions defined in ghc-prim
.)
But if any change there is visible in the API of base, we must consult CLC
This isn't always that straight forward in light of transitive function dependencies (e.g. function A is re-exported from base, but also uses internal function B, which is not re-exported... GHC devs change function B and accidentally break performance properties).
Remember, CLC does care about laziness/performance properties.
Who is going to asses whether there is a visible impact on base API?
@hasufell, I think that the transitive dependency issue is surmountable. GHC developers, aided by CI, would assess impact. We already have work-in-progress to verify that base
's exports and their types are not inadvertently changed. In principle we could (and perhaps should) add tests to verify the laziness properties of its exports. Performance is clearly harder, but I would argue that the difficulty imposed by splitting base
and ghc-base
is negligible.
Who is going to assess whether there is a visible impact on base API?
Good question. @bgamari has a patch that mechanically checks the API of base
, and complains if there are any changes. This is way more than we have right now!
This isn't always that straight forward in light of transitive function dependencies (e.g. function A is re-exported from base, but also uses internal function B, which is not re-exported... GHC devs change function B and accidentally break performance properties).
That is true, but is already a challenge:
base
depends on ghc-bignum
and ghc-prim
, so changes in either of those packages could change performance.base
, only those that the author thinks might affect performance. It'd be the same here. Changes to ghc-base
that affect performance of base
functions shoudl be discussed with CLC.One Good Thing would be to expand the performance tests in the base
test suite, and that is something the CLC might consider. But I see no systematic way to guarantee that a changes to GHC or its ecosystem of libaries won't have an effect on the perf of one or more base
exports. We just all have to do the best we can -- and that "best" is unaffected by whether the code happens to live in base
or in ghc-base
.
Another bit of technology is on its way: deprecated exports. GHC proposal #134, GHC ticket #4879, and an up-coming merge request. This will make it easier when we went to remove internal modules from base
and export them from ghc-base
only.
Apparently the Haddock issue really isn't an issue. For example, here is GHC.Exts which is mostly just a shim, and its Haddock page looks just fine.
@simonpj I do appreciate your input, but I don't enjoy derailing the conversation. Please share you opinion on "split base" in the relevant issue. As you aptly quoted, this proposal is deliberately not about it.
There seems to be an underlying notion that asking for CLC approval is a burden and a bottleneck. This is a false impression: there is a long queue of already approved proposals which GHC developers tarry to finish (MRs !8912, !10176, !10171, !10132) and another queue of proposals, where CLC is ready to vote, but MRs from GHC developers are still pending (#126, #133, #134). This makes me wondering even more about benefits of excluding things from CLC purview.
Please share you opinion on "split base" in the relevant issue
You mean here? Yes, good idea. I will do that. My reason for floating it here is that, given the lack of enthusiam for our proposal above I wanted to gauge the CLC enthusiasm for:
base
API and put them in ghc-base
Question: would that plan meet with CLC approval? I am keen to pursue a path that enjoys CLC support.
There seems to be an underlying notion that asking for CLC approval is a burden and a bottleneck
Not at all. It's rather that I want a clear way to signal to users that a particular data type or function is internal, and should only be used if you want to be exposed to GHC internals. We have no way to signal that at the moment, and that inevitably leads users to accidentally depend on things they shouldn't, through no fault of their own; and then pressure to keep these accidental and unintended interfaces stable into future. By providing a structural way to express intent, we will avoid future pain.
Forgive me if this is a silly question, but do the exposed-modules
/other-modules
distinction in the .cabal file not provide enough distinction? If these are indeed cabal packages, there is support for "internal libraries" as well. This would allow an equally-precise segregation of "within CLC scope" modules and "outside CLC scope" modules, without stumbling into the larger issue of actually cleaving base
in two.
do the exposed-modules/other-modules distinction in the .cabal file not provide enough distinction?
Alas, no. other-modules
cannot be imported, ever. But high-performance users specifically want to reach into the internals of GHC (e.g. the representation of arbitrary precision integers) even at the cost of being exposed to churn in those internals. Performance testing and debugging are other reasons that we want it to be possible to import internal modules. If cabal provided a 3-way distinction (hidden, internal, external) with the latter two being exposed, that would indeed address the problem.
But changing cabal would take time, and some people say "no need to change cabal, just split your package into two", so it's not a slam-dunk change.
(With the new feature of multiple public library components in cabal, this split could be made easier to manage)
Let's go back to this very proposal:
If we agree to internal modules, excluded from CLC purview, while maintaining PVP adherence... then that doesn't seem too bad. Although one could argue it might cause churn, because of more aggressive bumping of the major base version... I don't think it matters much, since GHC ships with base anyway (although theoretically, a GHC patch release could introduce a major base bump, which can be problematic... we might just have to avoid it).
In case we want base to be reinstallable and decouple it from GHC, then we can execute the base split. Because then, base PVP is much more useful/important than it is now.
So that could be a possible stopgap.
My analysis is similar to @hasufell here.
I'm uneasy with explicitly granting PVP exclusions. Given that the proposer (or anyone else) cannot recall a use case, I'd rather omit this for now. FWIW CLC does not normally opine on versioning of base
, GHC developers are bumping versions on their own.
I'm fine with documenting some (potentially, a wide range of) modules as internal. In my view this is a necessary prerequisite for a potential future package split into ghc-base
and base
, if any.
I'm generally agreeable with allowing "additive" changes to "internal" modules, potentially even without CLC approval. There is a certain subtlety about who is to decide that the change is purely additive. There is a usual complication with type classes and instances: adding a new type class or instance to base
could have far-fetching consequences, so it might be worth a consultation.
There seems to be an underlying notion that asking for CLC approval is a burden and a bottleneck
Not at all.
Interesting. I'd say that excluding modules from CLC purview is the most contentious part of the proposal. If it is not a key point, I'd suggest (in my private capacity) the following:
base
(roughly in line with Ben's analysis) as "internal". base
, even in "internal" parts, is not in our best interest.class Profunctor
into, say, GHC.Internal.Profunctor
module, because everyone will ignore warnings about instability and just go for it.@simonpj @bgamari does it look reasonable to you?
I am a little confused here. Ben is proposing to provide some structure where there is currently none (in effect) so that the GHC team and the CLC can do a better job of managing change, and so that everyone else can better understand what is stable and can be relied upon, and what is internal and more liable to change. This is a good, yes?
Does it make sense to object on the grounds that any change anywhere might be visible and therefore every change whatsoever going forward to anything in GHC must be subjected to a CLC approval process? This seems to be the logic of the objections, applied consistently.
Can we not make things a little better without trying to make them perfect. For somebody outside of GHC and the CLC Simon's proposal appears perfectly resonable.
[I wrote this comment without realising I was looking at the thread without @Bodigrim 's latest comment which it does not reflect — sorry.]
@bodigrim's proposal seems reasonable to me but the devil could be in the detail. I will be interested to hear what Ben and Simon think.
(The proposal eventually approved in this thread is https://github.com/haskell/core-libraries-committee/issues/146#issuecomment-1591871779 — Bodigrim, Sep 2023)
1. Background
Currently the
base
package exposes many internal implementation details of the implementationbase
functionality. By "internal implementation details" we mean functions and data types that are part of GHC's realisation of some exposed function, but which were never intended to be directly used by clients of thebase
library. For instance, theGHC.Base.mapFB
function is a necessary exposed part of the fusion framework formap
but which GHC's authors never intended users to call.This lack of clarity is bad for several reasons:
Users have no way to know which functions are part of the "intended, stable API" and which are part of the "internal, implementation details". Consequently, they may accidentally rely on the latter; they simply have no way to tell.
GHC's developers are hampered in modifying the implementation because too much is exposed. This imposes a high backward-compatibility burden, one that is an accident of history.
This status quo leaves much to be desired: users tend to rely on any interface available to them and therefore GHC developers are susceptible to breaking users when changing implementation details within
base
. On the other hand, there is a clear need to be able to iterate on the implementation of GHC and itsbase
library: fixing compiler bugs may require the introduction of new internal yet exposed primitives (c.f. the changes made in the implementation ofunsafeCoerce
in GHC 9.0) and improving runtime performance may require changes in the types of exposed internal implementation (c.f. GHC #22946).These difficulties are discussed in CLC #105.
2. Proposal
We propose to classify the modules of
base
into three groups:Hidden: these are simply the existing non-exposed modules (
other-module
in Cabal terms). No change here.External: these modules comprise the public API of
base
.exposed-modules
Cabal sectionInternal: these modules are part of the internal implementation of
base
functions.exposed-modules
Cabal sectionAs of today, all modules are either Hidden or External; the CLC policy is that the API of all exposed modules is subject to CLC review.
The main payload of this proposal is
2.1 Codifying the Internal vs External split
Our proposal is simply to declare whether a module is Internal or External, using some out-of-band mecanism like a publicly visible list.
However, future reorganizations (notably HF tech propoosal #47) might split
base
into two packages:ghc-base
, all of whose exposed modules are Internal.base
, all of whose exposed modules are External.That would codify the distinction between Internal and External, which would be a Good Thing. But the burden of this proposal is simply to make that distinction in the first place, and start a dialogue about which modules belong in each category.
Incidentally, the
Stability
Haddock field of a module is not the same as Internal vs External distinction. A module could be External (i.e. designed for external callers), and yet experimental and not yet stable. That seems to be the intended purpose of theStability
field, although it is not well describe anywhere (please tell us there is a good specification).We propose to document internal modules via a yet-to-be-named Haddock field.
2.2 Module by module summary
To make the discussion concrete, we have characterized each of the exposed modules in the
GHC.*
namespace along three axes:These findings, along with the stability indicated by the modules'
Stability
Haddock field, are summarized in this spreadsheet. We then used these assessments to define an action plan (seen in the "Action" column) which will bring us closer to the goal of clearly delineating the stable interface ofbase
. We do not intend to pursue this plan as one atomic change; rather, we intend for this plan to be an aspiration which we will iteratively approach over the course of the coming years, largely driven by the needs of the GHC developers.The proposed actions fall into a few broad buckets:
In the sections below we will discuss some of the reasoning behind these proposed actions and draw attention to some open questions.
3. The question of
GHC.Exts
Historically
GHC.Exts
has been the primary entry-point for users wanting access to all of the primitives that GHC exposes (e.g. primitive types, operations, and other magic). This widely-used module poses a conundrum since, while many of these details are quite stable (e.g.Int#
), a few others truly are exposing implementation details which cannot be safely used in a GHC-version-agnostic way (e.g.mkApUpd0#
,unpackClosure#
,threadStatus#
). There are at least two ways by which this might be addressed:Int#
,Weak#
,newArray#
, etc.) inGHC.Exts
, leaving the rest to only be exposed viaGHC.Prim
(which should not be used by end-users), orGHC.Exts
to be unstable and export the stable subset from another namespace (e.g.Word#
and its operations could be exposed byGHC.Unboxed.Word
)4. Non-normative interfaces
Several interfaces exposed by
base
intentionally reflect internal details of GHC's implementation and, by their nature, should change to reflect changes in the underlying implementation. Here we call such interfaces "non-normative" as they are defined not by a specification of desired Haskell interfaces but rather by the system that they reflect.One such module is
GHC.Stats
, which allows the user to reflect on various statistics about the operation of the runtime system. If the runtime system were to change (e.g. by adding a new phase of garbage collection), users would expect the module to change as well. For this reason, we mark such non-normative interfaces as "internal".