Closed bgamari closed 1 year ago
I don't think this needs change in PVP. The breaking change is already SHOULD (not MUST). People can say in documentation that modules are not part of PVP.
However, if you want to formalize that, you'll need to add an additional point like:
- Client defines orphan instance. If a package defines an orphan instance, it MUST depend on the minor version of the packages that define the data type and the type class to be backwards compatible. For example, build-depends: mypkg >= 2.1.1 && < 2.1.2.
e.g.
6b. Client imports .Internal
module. If a package imports an .Internal
module, it MUST depend on the minor version of the package exporting that module. For example, build-depends: mypkg >=2.1.1 && <2.1.2
.
Remember, the PVP is policy for both, library producers ("authors") and library consumers ("users").
I'd also like https://wiki.haskell.org/Import_modules_properly to be folded into PVP spec. That part is overlooked, and causes some pain. (TL;DR don't use unqualified imports if you want to use major bounds).
EDIT: Yes, I'm fully aware that would make using Internal
modules not very convenient. That is on purpose. But if the author of library doesn't provide strong guarantees for these, the downstream have to be defensive.
I'd also argue that e.g. bytestring
.Internal
modules will be a bad example. They are very stable, they just happen to expose bytestring
internals. (And bytestring-0.11
showed that a lot of stuff depends on these internals). cc @Bodigrim
Minor comment: the .Internal
point shouldn't be the first one.
I don't think this needs change in PVP. The breaking change is already SHOULD (not MUST). People can say in documentation that modules are not part of PVP.
Interesting. Can you say which clause in the current PVP you are referring to? I see "SHOULD" only in clauses 7 and 8.
An omission of the current PVP is that it makes no distinction between exposed and un-exposed modules of a package. I think, for the avoidance of doubt, that it would be helpful to say explicitly that clauses 1 and 2 (breaking and non-breaking changes) apply only to exposed modules. Changes to un-exposed modules come under claues 3 (other change). That is doubtless obvious to many readers, but (I can certify) not to all.
I don't think this needs change in PVP. The breaking change is already SHOULD (not MUST). People can say in documentation that modules are not part of PVP.
I don't understand; the "breaking change" paragraph explicitly says that breaking changes (e.g. entity removals, orphan additions, or type changes) MUST be result in a version increment.
We're formalizing a convention that has been used by people in the wild. But that doesn't mean this is the best way to signal "this module is exempt from PVP".
Perhaps not. However, given that this widely used convention is currently constitutes a violation of the PVP (see above), I think it would be reasonable to both formalize the status quo and simultaneously discuss alternatives.
I feel this is too much of a big change to apply ad-hoc without larger community consensus.
Of course; I fully expect there to be discussion on this matter. This MR is merely intended to start the conversation.
Further, this definitely is a breaking PVP change in my opinion. Tools assuming current PVP wording will not work correctly with this change and need adjustment.
What tools are you thinking of in particular? As far as I can tell, they would already be broken with a significant fraction of code found on Hackage today.
Further, this definitely is a breaking PVP change in my opinion. Tools assuming current PVP wording will not work correctly with this change and need adjustment.
What tools are you thinking of in particular? As far as I can tell, they would already be broken with a significant fraction of code found on Hackage today.
Any PVP compliance tool.
Technically, they would be correct to not treat 'Internal' modules specially. Otherwise, why are we raising this PR? So the bug lies within the hackage packages, from the PVP compliance tool POV.
I'm not saying this is a problem, but will require a new spec version.
I don't think this needs change in PVP. The breaking change is already SHOULD (not MUST). People can say in documentation that modules are not part of PVP.
Interesting. Can you say which clause in the current PVP you are referring to? I see "SHOULD" only in clauses 7 and 8.
An omission of the current PVP is that it makes no distinction between exposed and un-exposed modules of a package. I think, for the avoidance of doubt, that it would be helpful to say explicitly that clauses 1 and 2 (breaking and non-breaking changes) apply only to exposed modules. Changes to un-exposed modules come under claues 3 (other change). That is doubtless obvious to many readers, but (I can certify) not to all.
I remembered that 1. clause has SHOULD, the MUST there is IMO a tad bit too strong.
I don't think this needs change in PVP. The breaking change is already SHOULD (not MUST). People can say in documentation that modules are not part of PVP.
Interesting. Can you say which clause in the current PVP you are referring to? I see "SHOULD" only in clauses 7 and 8.
An omission of the current PVP is that it makes no distinction between exposed and un-exposed modules of a package. I think, for the avoidance of doubt, that it would be helpful to say explicitly that clauses 1 and 2 (breaking and non-breaking changes) apply only to exposed modules. Changes to un-exposed modules come under claues 3 (other change). That is doubtless obvious to many readers, but (I can certify) not to all.
I remembered that 1. clause has SHOULD, the MUST there is IMO a tad bit too strong.
https://github.com/haskell/pvp/commit/3e1cc4483245180f632803ea15f1c98c938079b3
Looks to me like it has always been there.
What would be the rationale to make it SHOULD? Wouldn't that render the guarantees of upper bounds useless?
What would be the rationale to make it SHOULD? Wouldn't that render the guarantees of upper bounds useless?
One example is packages like quickcheck-instances
which may add orphan instances in minor versions. If PVP were followed strictly, these changes would always cause a major bump, but the minor bumps is "lesser evil". (EDIT: issue #3)
I also remember discussing some changes which would required major bump, but it was considered better to do a minor bump. Unfortunately I don't remember where (IIRC some of ekmett's packages, not so long ago).
And also .Internal
modules where they do change "on will". Technically, PVP doesn't allow any exemptions.
After all
3. SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.
But I understand that people make mistakes, and/or not always fully understand full implications so using MUST i.e. an absolute requirement is "safer".
Here we formalize the convention of using
.Internal
modules to provide potentially unstable interfaces.
"Unstable" does not necessarily mean "excluded from PVP". It's helpful to mark parts of the public API as unstable or unsafe (= potentially violating internal invariants), just bump a major version each time you change them.
.Internal
modules mostly stem from cabal v1-build
being hardly capable to deal with multipackage projects. In the modern setup if you want to expose internals of a package foo
, you can create foo-internal
package with rapidly increasing major versions, while foo
would re-export only stable parts of foo-internal
.
@Bodigrim how would that work for base or other boot libraries? We have to add new boot libraries? Do we have a process for that?
There is no process, just a good will of GHC team. Cf. recent split of Cabal
into Cabal
and Cabal-syntax
.
FWIW bytestring
and text
treat changes to Internal
modules seriously. Such modules are marked as unstable and unsafe, but we strive not to break them without major version bump.
I support this change to PVP and I'd like to see it being accepted! 🙏🏻
The Internal
module convention was folklore knowledge for a good amount of time, and it makes sense to formalise it and align the community across common naming convention.
There exist a real problem in the design of public APIs:
So, there should be some way to satisfy both. I see the Internal
module convention as the easiest way to achieve this.
Strictly speaking, the easier way would be to write a single documentation line in the module description, but having a separate module namespace brings more visibility and makes the life of static analysis tools easier.
Addressing the specific points and questions in this PR:
Have we discussed other options? I think this needs to be discussed.
What are the other options? So far, I've seen only two suggestions in this thread:
internal-modules
in .cabal
:
This also allows easy static analysis and doesn't depend on module names.
As an author of a Haskell static analysis tool, I can assure you there's no easier thing than checking a module name for a substring 😌
Also, AFAIK, there's no existing static analysis tool that actually checks PVP integrity of Haskell packages.
foo-internal
packages
I feel this is somewhat ambiguous.
- what about Foo.Internal.Bar?
- what about Foo.InternalBar?
Both include the string .Internal. What if a package is called Internal?
I propose to create the "Definitions" section, and put a definition of an internal module in there. Something like this:
An **internal** module - we say that the module is internal if its a non-root module with the name `Internal` or if it's nested inside the `.Internal.*` namespace.
Examples:
* `GHC.Internal`
* `Logger.Internal.Tag`
Why not formalize what people actually do, which seems to work reasonably well in practice today?
Because that's how specs generally deteriorate and simply become documentation of current (good and bad) practices.
I'd rather work towards making the spec more general and leave space for languages and tooling to decide on non-critical things.
I also don't see any particular time pressure here. Are we in a rush?
"Avoiding the formalisation of existing practices" doesn't sound to me like a universal rule to follow. Every case should be considered separately. What if the practice is really good? Are we going to avoid it just because we should formalise something different?
As I mentioned previously, I see a real problem and I don't see a better alternative. Getting more feedback could help to get more options 😌
I see one problem with the Internal
convention:
Some packages provide Internal modules, but follow PVP regardless for those. This might become a problem if we want to make GHC aware of it and emit warnings. One could argue they can silence it with OPTIONS_GHC
pragma, but it might still be confusing for consumers.
Why does this matter? Because as @Bodigrim noticed: internal doesn't equal unstable (or "exempt from PVP").
With the cabal file stanza I see the following problem:
It's now a cabal thing. That means GHC can't check it ad-hoc and that will also affect e.g. ghc --make
or custom build systems (they exist, for large popular projects).
I want to repeat myself (and @Bodigrim and @hasufell).
Internal modules doesn't mean Unstable. They are just Internal, something you shouldn't need but it's there in case you need them.
Don't make Internal
module unstable. Or if you do, I'd expect you to come up with a name for internal but stable modules, make patches to bytestring
, text
(at the very least) and affected packages on Stackage.
I want be able to depend on bytestring
internals without being required to specify minor upper bounds like bytestring <0.11.5.0
.
As another comment. The patch doesn't discuss any changes in client specifications. It should, how users of libraries should specify upper bounds if they import .Internal
modules?
Under the status quo there are three options for a package maintainer:
This issue suggests adding
.Internal
name, don't churn the major version each time a change is made to the unstable, internal modules, making most of the package consumers happyIf you don't want the PVP to adopt the proposal here, which of 1-3 would you say is acceptable?
I want to note an overall tension regarding modularity and versioning that I fell it would be good to have some way to remedy. I feel this tension is sufficiently pervasive and underlies enough problems that it is worth giving a name to: The Versioned Modularity Problem.
If I am using a package and it does not expose some constructor, but I need to make use of that constructor, then I am very frustrated and wish the author had expose that constructor. Therefore, packages should always expose their internals!
However, if I am using a package and it does expose some constructor, but only "just in case", and now that constructor changes in a way that no normal end user cares about, that is still a breaking change and I have to change my upper bounds for no good reason, as do all other users of that package. Therefore, packages should avoid exposing their internals!
The versioned modularity problem is a puzzle: how can one structure packages so that violations of intended encapsulation do not disrupt the monotonicity properties of package versioning.
I suggest that the idea of "Internal as PVP exempt" is an attempt to solve this problem, but one which people have provided some pretty good arguments against.
Yes, I think in an ideal world individual identifiers would be versioned, not packages, but we are far from that world.
There are fourth and fifth option. Both in use today.
Fourth: Specify stability per module basis. This is flexible as it doesn't make any Internal
module unstable by default, nor really prohibits making other modules than Internal
unstable, like Experimental
mdoule or just a module for a feature which isn't yet stabilised but actively developed 1.
Fifth: Have a separate package (e.g. foo-internals
) and re-export only stable part of it in foo
. (I remember seeing someone doing something like this, but couldn't find right now).
I have thinking about the latter in context of aeson
. There is a very stable API parts: ToJSON
and FromJSON
with decode
and encode
(e.g. what servant
uses). But servant
still needs to keep up with changes elsewhere in aeson
. There could be a package which re-exports these very stable parts.
I think it could stayed on the same major version for a decade as e.g. new instances are just minor additions. That doesn't fit Internal
pattern at all.
So there are ways to work around "Versioned Modularity Problem". I agree they are a bit of a burden for a developers of the libraries, but no additional rules for the consumers (which hopefully are more).
1 We may disagree whether that a good development/release practice, but it's there if someone needs it. For example churning aeson
versions for an experimental feature not many users are even aware of would be a pain ecosystem wise.
Fourth: Specify stability per module basis
I'm not sure what this means. For one of the things it might mean, doesn't it have the same problem as 2?
Fifth: Have a separate package (e.g. foo-internals) and re-export only stable part of it in foo
I find this quite plausible.
@tomjaguarpaw
I'm not sure what this means. For one of the things it might mean, doesn't it have the same problem as 2?
No. If a maintainer exempt a module from versioning, marking it as unstable, than no need to churn major versions. Most package consumers won't care/notice. E.g. https://hackage.haskell.org/package/aeson-2.1.1.0/docs/Data-Aeson-Internal.html
That's a practice which OP tries to formalise, but OP wants to "hardcodes" all Internal
modules to be such, which I disagree with. Not all Internal
modules are unstable, exempt from PVP, e.g. not in text
or bytestring
.
Ah, I see. You are suggesting that the issue should be a matter of the package maintainer providing some documentation of the PVP exception, rather than the exception applying to all modules with .Internal
in the name. I find that quite plausible too.
Fifth: Have a separate package (e.g.
foo-internals
) and re-export only stable part of it infoo
. (I remember seeing someone doing something like this, but couldn't find right now).
I think this is the best solution. attoparsec
does something similar: attoparsec-internal
exposes guts and low-level stuff, required for granular testing and benchmarks, and attoparsec
proper provides a high-level interface. Both are fairly stable however. In principle, attoparsec-internal
could have been released as a separate package, but nobody asked yet.
Also see http://nikita-volkov.github.io/internal-convention-is-a-mistake/, I'm pretty much in agreement with @nikita-volkov there.
I'm sympathetic to the foo-internal
package approach under the assumption that it follows PVP regardless.
As such, I'm -1 on formalizing the .Internal
convention to signal PVP exemption.
@Bodigrim do we exercise formal voting on PVP matters similar to base (including hackage trustees of course)? This ticket has been open for a while and it's correlated to the base split discussion. I suggest to exercise voting in a week or two.
I find the motivation for this ticket rather week. As other have outlines, we have the facilities to do this today. (-internal
packages). And yes, I think it would be good to have them follow PVP as well. The consumer can always put very generous bounds on them if they so want. No change to the pvp, nor any existing tooling, or existing behaviour is really needed. The social topic of having people follow the pvp (or nudge them to put internal stuff into an -internal
package) remains the same afaics.
@Bodigrim Thanks for the link to the -internal
package approach! I can see how this approach looks appealing.
I see only two downsides to it:
iris
, I have an internal module with a single record type. I don't really want to create and maintain a separate package just for this module. I want a quick and easy way to say "if you really need to depend on unstable internals, you have the opportunity but I don't provide any guarantees".Thus being said, I find the option suggested by @phadej the most appealing:
Again, I'm up for any approach that solves the problem with the least required effort. However, the Haddock documentation doesn't list possible values for the Stability
metadata field and is thus prone to ambiguous misinterpretation.
From Haddock docs:
The Copyright, License, Maintainer and Stability fields should be obvious.
Great 👍🏻
If I were to move the conversation towards the "documentation route", I'd suggest introducing the following three values:
stable
: follows PVP, authors try to maintain backward-compatibilityexperimental
: follows PVP, but the API may change significantly between major versionsinternal
: doesn't follow PVP, API may break between minor releases without prior notice and/or migration guidesMaintaining a separate package requires more time and effort
If you are going to change internals, you'd need to release a new version of package(s) anyway. I don't see how there is truly additional work?
And if you depend on the internals of other packages, you implicitly agree that things change, and having these internals change with a policy is a lot better than changing without any policy (or an ad-hoc one).
In iris
you have a single module. But why it's exported? If you need it for testing (i.e. not really meant for public "unsafe" use), why not using an internal sublibrary?
In iris you have a single module. But why it's exported? If you need it for testing (i.e. not really meant for public "unsafe" use), why not using an internal sublibrary?
In my experience, there're much more problems with hidden internals when people can't access functions they want and much fewer problems with unstable internals when it's said explicitly that these parts of the codebase are for internal usage only.
I'm trying to solve a pragmatic problem here: let people use the internals of my lib if they really want with as little work as possible from my side.
as little work as possible from my side.
I'd say this is where we disagree in our philosophies. I don't think that allowing to tinker with internals needs to require as little work as possible from the authors of a library. I don't think it should require a lot of work, but some work should be acceptable.
There is always an option to not expose internals (no work), or make the internals stable (potential version churn), or expose needed functionality in some other stable way (some thinking and implementation work).
My recent experience exporting internals in a module marked as "may change at any moment, use at your own risk" doesn't really work: https://github.com/haskell/aeson/commit/6c2050fa4914ce793441ec976e90b824bff9e734#r102022391
So even you say this module may change at any moment, someone will complain anyway, even if you warned them beforehand.
Note: I'm being kind not just removing that module / its contents, but waiting until there is other needs to do major bump. And warning people that it will soon go. Nothing is ever enough. </rant>
I'm trying to solve a pragmatic problem here: let people use the internals of my lib if they really want with as little work as possible from my side.
Sorry, but this is not a good basis on which to develop a specification on.
The result of being lax and allowing ad-hoc escape hatches is this: chaos.
If this requires maintainers to do a bit of additional work, so be it. If they don't want to do it, they can stay out of hackage and maintain their library on Github.
@Bodigrim do we exercise formal voting on PVP matters similar to base (including hackage trustees of course)? This ticket has been open for a while and it's correlated to the base split discussion. I suggest to exercise voting in a week or two.
Yes, a formal CLC vote + approval from Hackage Admins would be due.
I am reluctant to trigger a vote, because AFAIU this is not the intention of the proposer just yet, see the closing line of https://github.com/haskell/pvp/pull/46#issuecomment-1330038905. I assume when the time comes the proposer will write up a more extensive proposal with justification and discussion, otherwise it is very thin.
I'm trying to solve a pragmatic problem here: let people use the internals of my lib if they really want with as little work as possible from my side.
Besides sending a patch or making a fork, as a nuclear option users can always define a copy of your data type and unsafeCoerce
it back and forth. They would have to be extremely careful when updating the version of your library, but that would be the case if your library's Internal
modules were exempt from PVP as well.
Some thoughts that arose while reading this:
I suppose the themes are mistaking social politeness with robotic certainties, and shoehorning needs across different dimensions into one solution.
Undecided on the proposal as a whole, still marinating.
My loose thoughts:
The PVP may provide some general guidelines, but it is ultimately the responsibility of the package authors to define the stability of their packages, i.e., what is or is not a breaking change for the purposes of incrementing version numbers. Everyone has a different idea of the behaviors they care about.
A package versioning policy is inherently a fuzzy specification because of the subjectivity in what constitutes a "breaking change". The PVP lists some concrete examples of breaking and non-breaking changes, but it's far from exhaustive. Is performance observable? Is the order in which a program reads files observable? Would you care if GHC compiles modules in a different order?
I'm not sure I would even treat the examples that are provided by the PVP as absolutes. For a type-level programming library like first-class-families, most of the definitions are type instance
declarations, and as type-level entities, any change to those would be considered a breaking change in general. But as it is meant to be a type-level equivalent of a standard library like "base", I could also argue that various changes to type instance
in first-class-families are just refactorings, and not meant to be observable. I could follow a literal interpretation of the PVP and just increment the major version number every time I do a refactoring of type-level entities, but it seems wrong that I have to version my package differently just because I'm writing programs "at the type level".
This fuzzy interpretation does make it more difficult for tooling to implement the PVP (whatever that means), but (1) there is no widely used such tool to my knowledge, (2) if the tool aims to usable in practice, say to be somehow deployed on Hackage, it will need a way of dealing with exceptions to the rule because (i) breaking changes at the level of function behavior is undecidable if you even have a formal definition of it, (ii) a lot of (most?) existing packages on Hackage do not follow the PVP closely (because the PVP is underspecified as argued above), (iii) mistakes happen. That could just mean the tool implements a simplistic rule regardless, and people will know to ignore warnings on modules that are .Internal
.
I think the PVP is underspecified and the Internal
convention could be argued to either already be allowed by it, or any practical tool will have to deal with it anyway. So there isn't much point in codifying it explicitly.
The PVP lists some concrete examples of breaking and non-breaking changes, but it's far from exhaustive
That's exactly the point of a spec.
PVP says what must constitute a major version bump and leaves the option to the maintainer whether to be more aggressive (e.g. behavior changes, which are also discussed here: https://github.com/haskell/pvp/issues/10).
This is not "fuzzy". This is exactly how most specifications work (including the Haskell report). They're not exhaustive and are so deliberately.
Is performance observable?
This can't be answered easily, so PVP refrains from commenting on it. It's too compiler dependent and underspecified.
Is the order in which a program reads files observable? Would you care if GHC compiles modules in a different order?
That is not necessarily part of the programming interface. End-user programs are a completely different problem. However, again, this is discussed in https://github.com/haskell/pvp/issues/10
I'm not sure I would even treat the examples that are provided by the PVP as absolutes. For a type-level programming library like first-class-families
Well, I'm not sure PVP ought to be on top of all the latest type level stuff, which are all GHC extensions and not in any proper language spec/standard.
I think the PVP is underspecified
This is a feature. To a degree.
Well, I'm not sure PVP ought to be on top of all the latest type level stuff, which are all GHC extensions and not in any proper language spec/standard.
I could make a type level programming library with only classes and instances (no type families), and my point would still apply. Any change to those would be a breaking change if the examples in the PVP are to be taken literally, so a "type-level" translation of base would have to be versioned differently from base.
PVP says what must constitute a major version bump and leaves the option to the maintainer whether to be more aggressive (e.g. behavior changes, which are also discussed here: https://github.com/haskell/pvp/issues/10).
Thanks for clarifying this. This changed my mind about how to interpret the PVP. In that case I'm in favour of keeping the PVP minimal (and switching my practice from .Internal
modules to -internal
packages).
My point about versioning type-level programming still stands but that's for another discussion.
Can we come to a conclusion here? Is this even still relevant given the accepted tech-proposal wrt base split?
The accepted base-split proposal doesn't address internal modules in general. However, I think it reflects pretty clear consensus from leading community bodies (clc, ghchq, twg) that the internal convention is to be avoided and instead split packages with proper pvp are to be preferred. I think the current proposal should be closed on that basis. However, a proposal would be welcome to amend the faq to encourage the spit-module approach be adopted as a convention. Note that this is only changing the faq, since it is inline with the pvp as it already exists.
However, a proposal would be welcome to amend the faq to encourage the spit-module approach be adopted as a convention
Here we formalize the convention of using
.Internal
modules to provide potentially unstable interfaces.Addresses #8.