Closed simonpj closed 1 year ago
I don't have any thoughts on your question. However, I do have thoughts about the class, because of some work I was doing recently. In particular, it would be nice to have one or two associated types, and probably another method:
RoundTrips
, which when 'True
guarantees that tagToEnum# (dataToTag# x) = x
. This holds for enumeration-like types (which may be GADTs and thus not Enum
).NumCons :: Nat
, the number of constructors.numCons :: Int
, the number of constructors.it's a change, but a backward-compatible change
Is it backwards-compatible? That means that everywhere a user can successfully compile dataToTag#
today they can compile dataToTag#
after the change. How about
myDataToTag :: forall (a :: Type). a -> Int#
myDataToTag = dataToTag#
That compiles today. Will it compile after this change? I doubt it. But have I missed something here?
That compiles today. Will it compile after this change?
Ah, you are right. Of course: myDataToTag
can't be that polymorphic: indeed that's the point!
So, fair cop: it's not backward compatible if you write over-polymorphic functions involving dataToTag
.
What about newtype wrappers of various sorts? Will dataToTag#
still be able to peer through those?
What about newtype wrappers of various sorts? Will dataToTag# still be able to peer through those?
Can you give concrete examples of what you mean? DItto for your earlier suggestions, not all of which I could parse. Thanks!
@simonpj dataToTag#
and tagToEnum#
are both lightweight "generic" functions, but they're missing some metadata. If we have a TagToEnum
class, my round trip thing is irrelevant, so ignore that. But someone who has something TagToEnum
and DataToTag
is relatively likely to want to know how many constructors the type has. That will let them know things like how many bits are required to serialize the type.
And we propose to define
dataToTag :: forall (a :: TYPE LiftedRep). DataToTag a => a -> Int dataToTag' :: forall (a :: TYPE UnliftedRep). DataToTag a => a -> Int
as wrappers for
dataToTag#
that return a civilised boxed Int.
dataToTag#
is usually used in low-level, performance-critical manipulations, so I would not say that there is users' demand for "civilised boxed Int
", if it simplifies.
The newtype question: today we can apply dataToTag#
to All
to determine whether it's true or false. Will that remain the case? I don't think it necessarily needs to, but it would be good to make an intentional decision.
A symmetry question: I don't see a TagToEnum
constraint counterpart proposed above. That seems very sad; I would surely want both. Perhaps the answer is performance? tagToEnum#
is not operationally polymorphic today, as I understand it. Would it be possible to make it so, and yet be sure to specialize it even with -O0
?
I just discovered that tagToEnum#
doesn't work for GADTs. That makes it much less useful than I imagined it to be. Oh well.
Can I ask: does the CLC see dataToTag#
as part of the API for base that users can rely on (and hence which the CLC creates) or as part of GHC internals, purely an internal implementation matter?
If the former, we are totally open for the CLC's guidance. Eg. no dataToTag
and dataToTag'
is fine with me.
The newtype question: today we can apply dataToTag# to All to determine whether it's true or false
I see. To spell it out, given
newtype All = MkAll Bool
is dataToTag# (x :: All)
well typed? And if so what answer does it give?
Today I think the answer is "yes it is well typed", and would return the tag on the enclosed Bool. I don't know if that's what you want, and would welcome the CLC's guidance. It's a good question.
We can implement either behaviour I think; but rejecting the above is certainly easier. Accepting it would lead to more design questions like "what if the data constructor is not in scope".
My instinct to make a conservative choice now (reject), and loosen up later if we find compelling applications.
A symmetry question: I don't see a TagToEnum constraint counterpart proposed above
There will certainly be a TagToEnum
follow up. But I thought it'd be simpler to do a trial run on a more focused topic.
But someone who has something TagToEnum and DataToTag is relatively likely to want to know how many constructors the type has. That will let them know things like how many bits are required to serialize the type.
Hmm. I'm not convinced. Can you show some compelling examples? I suspect you'll quickly need the Data class anyway if you want to do anything polymorphic. Currently dataToTag#
is essentially always used monomorphically.
Can I ask: does the CLC see
dataToTag#
as part of the API for base that users can rely on (and hence which the CLC creates) or as part of GHC internals, purely an internal implementation matter?
It's kinda borderline, but I'm inclined to say that it's in scope for CLC because of a new type class.
@simonpj could you please elaborate why do you need a type class? Could we just have dataToTag# :: forall {lev :: Levity}. TYPE (BoxedRep lev) . a -> Int#
as a primop?
Hmm. I'm not convinced. Can you show some compelling examples? I suspect you'll quickly need the Data class anyway if you want to do anything polymorphic. Currently
dataToTag#
is essentially always used monomorphically.
tagToEnum#
bans GADTs, which makes it unsuitable for what I wanted, so here's a more simplistic example. Suppose I want to serialize and deserialize arrays ("words") consisting of some enumeration type ("letters"). Efficient serialization using dataToTag#
takes little code; there's no need to specialize to different types. Deserialization must invert serialization, so it must use tagToEnum#
. But for that, it needs a dictionary. And it also needs to know how many bits to use for each element.
What I actually wanted to use this for, that would require more changes to work:
dependent-map
offers a type
type DMap :: (k -> Type) -> (k -> Type) -> Type
which is basically the same as Data.Map
, but with different kinds (and keys that are usually GADTs). I want to write something similar for the special case of keys with only nullary constructors. Rather than a binary tree, I want a sparse SmallArray
(like one node of a HashMap
). This requires knowing at the type level whether the number of constructors is appropriate for this representation, dataToTag#
to get array indices, and a (thoroughly unsafe) use of tagToEnum#
for indexed traversals.
@simonpj could you please elaborate why do you need a type class? Could we just have
dataToTag# :: forall {lev :: Levity}. TYPE (BoxedRep lev) . a -> Int#
as a primop?
I don't think dataToTag#
is supposed to make sense for function types. And somewhere (above? In the GHC ticket?) there's mention of the fact that it breaks parametricity; that's generally unfortunate, but it's not the only primop to do so.
@simonpj could you please elaborate why do you need a type class? Could we just have dataToTag# :: forall {lev :: Levity}. TYPE (BoxedRep lev) . a -> Int# as a primop?
Well, dataToTag#
is, as its name suggests, a way of finding the tag of a data constructor. So you need to be sure that you are applying it to a data constructor and not, as David says, to a function. Or even perhaps a newtype. This is a way to make sure.
Does breaking parametricity matter? Well, foldr/build fusion becomes unsound.
tagToEnum# :: Int# -> a
is a more compelling example, where a
has to be a data type; but there is a nice duality here.
Well, dataToTag# is, as its name suggests, a way of finding the tag of a data constructor. So you need to be sure that you are applying it to a data constructor and not, as David says, to a function. Or even perhaps a newtype. This is a way to make sure.
I have no idea if this is true, but are there perhaps other functions in the future that might also be interested in this? If so, that suggests maybe the type class you want (or perhaps even just a built in IsData :: Type -> Constraint
constraint) would do the job, rather than bundling this into DataToTag
? Or maybe DataToTag
is exactly this class, but a more general name (IsData
or something) would be more suitable.
I may well be picking bike shed paint colours out before we've even got the bike shed, though.
GHC.Base
sounds to me like something that users are not supposed to think is stable?
@Ericson2314 the situation is as follows:
ghc-prim
, does not currently mandate a CLC vote. That is fine. GHC.Base
is an ultimate source of many core type classes, e. g., class Monad
, changes to which do require CLC vote. GHC.Base
. Strictly speaking, such changes require the same level of scrutiny as changing class Monad
. @simonpj how magical is the proposed new type class? Can users define their own instances?
Changing primops, exported from ghc-prim, does not currently mandate a CLC vote. That is fine.
Once again I ask: what is the criterion you are using here. I am super keen to have a clear criterion for
I am really keen to know! You clearly have a criterion @Bodigrim: can you say what it is? Talking of "core type classes" is just not precise. Is DataToTag
a core type class?
I think your implicit criterion may be:
GHC.
prefix.Let's review your bullets in the light of this proposed definition:
GHC.*
module.GHC.*
module. For example Monad
is exported by Control.Monad
GHC.*
module.So, is that the definitino? And if not, what is the definition? Thanks! Clearing this up once and for all would be incredibly helpul
Here is an example of a criterion that might work:
Internal
as a component of the module nameThis "internal modules" convention is widely and successfully used in the Haskell ecosystem.
[EDIT: corrected Unstable
to Internal
]
For base
could we not use GHC.*
as the prefix? I think we are close to that already, so it'd be a much less disruptive change than renaming all GHC.*
to Unstable.*
(Separately, I though the widely-used convention was Internal
not Unstable
. At least that's what I've been told by others. But regardless, I'd love to make progress on base
first, since a system-wide convention seems elusive.)
Separately, I though the widely-used convention was Internal not Unstable
Oh, I beg your pardon, yes. I'll amend my post.
For base could we not use
GHC.*
as the prefix?
That might be rather misleading, since there are plenty of modules exported from the ghc
package whose names start GHC.
and which are supposed to be public and reasonably stable, aren't there? I think a different stability naming convention between base
and ghc
would be rather confusing.
it'd be a much less disruptive change than renaming all
GHC.*
toUnstable.*
I agree that renaming GHC.*
modules in base
is very disruptive. I suggest it doesn't happen. Instead all such modules should be grandfathered in and simply marked prominently as "unstable" or "internal" at the top of their Haddocks.
That might be rather misleading, since there are plenty of modules exported from the ghc package whose names start GHC. and which are supposed to be public and reasonably stable, aren't there?
Are there? Most stuff comes through Prelude, or Data.List, or Control.Monad etc. What things do you have in mind? And if there are, would it not be better to export them through some more "blessed" module name?
I agree that renaming GHC.* modules in base is very disruptive. I suggest it doesn't happen. Instead all such modules should be grandfathered in and simply marked prominently as "unstable" or "internal" at the top of their Haddocks.
That is one possibility. It has the disadvantage that one has to know the list ot stable modules -- you can't tell from their names. But that's just my opinion. I am entirely content to submit to the Will of the CLC, if the CLC can come to a view on the matter. Thanks for contributing to the dialogue!
Sorry for my late arrival here. I have several remarks:
@simonpj could you please elaborate why do you need a type class? Could we just have dataToTag# :: forall {lev :: Levity}. TYPE (BoxedRep lev) . a -> Int# as a primop?
I included some discussion about this in (the current draft of) the overview note for the MR, which I have reproduced here for convenience.
Similar reasoning to point 2b also means that uses of dataToTag#
at a newtype are currently sometimes optimized less well than uses of dataToTag#
at its underlying data type. Point 3 relates to this GHC proposal which calls for a breaking change to dataToTag#
for this reason. To see the unspecified behavior suggested by point 1, try running the following program with today's GHC but with various settings:
{-# LANGUAGE MagicHash #-}
import GHC.Exts (Int(..), dataToTag#)
main :: IO ()
main = print (I# (dataToTag# flip))
Running interactively or compiling with -O0
I see 0 but with -O1
I see 2. (I think that's because flip
has arity 3.)
Is it backwards-compatible? That means that everywhere a user can successfully compile
dataToTag#
today they can compiledataToTag#
after the change. How aboutmyDataToTag :: forall (a :: Type). a -> Int# myDataToTag = dataToTag#
That compiles today. Will it compile after this change? I doubt it. But have I missed something here?
Correct: This code is currently OK but will fail to typecheck after DataToTag
is added, most likely with an error like No instance for ‘DataToTag a’ arising from a use of ‘dataToTag#’
. But the breakage is all compile-time-only.
dataToTag# is usually used in low-level, performance-critical manipulations, so I would not say that there is users' demand for "civilised boxed Int", if it simplifies.
The most compelling use for dataToTag#
in user code I have seen is to reduce boilerplate in Eq
or Ord
instances that aren't quite derivable. Here's an example from within GHC: Being able to say dataToTag lit1 < dataToTag lit2
instead of isTrue# (dataToTag# lit1 <# dataToTag# lit2)
or I# (dataToTag# lit1) < I# (dataToTag# lit2)
would be slightly prettier. But these wrappers are small, and if I am importing GHC.Exts
to get dataToTag#
anyway I am probably unfazed by such a small inconvenience.
@simonpj how magical is the proposed new type class? Can users define their own instances?
Rather magical: Users cannot define their own instances. (I suppose that since DataToTag
is a one-method class users can technically provide something like local instances via withDict
. But that's probably not very useful.)
I may well be picking bike shed paint colours out before we've even got the bike shed, though.
There is one un-answered question about the current implementation's interaction with backpack, and there are a few other non-essential tasks I'd like to take care of before merging. But the implementation is basically complete!
Here's a quick run-down of the basic user-facing design questions, and how the current implementation answers them:
DataToTag
)GHC.Magic
)DataToTag
? (Currently: GHC.Exts
and GHC.Base
)GHC.Base.getTag
be deprecated? (Currently: it isn't)Int
instead of Int#
be provided? (Currently: they aren't)DataToTag T
constraint be solved when not all data constructors of T
are in scope?
DataToTag T
constraint behave when T
uses DatatypeContexts
?
DataToTag
's special solving behavior look through newtypes? (Currently: it doesn't)That might be rather misleading, since there are plenty of modules exported from the ghc package whose names start GHC. and which are supposed to be public and reasonably stable, aren't there?
Are there?
I'm talking about the GHC API, which is a collection of modules exposed from the ghc
package, all of whose names start with GHC.
and some of which are presumably intended to be stable.
I'm talking about the GHC API, which is a collection of modules exposed from the ghc package, all of whose names start with GHC. and some of which are presumably intended to be stable.
Aha. I was talking exclusively about the base
package; the GHC
package is another matter, since every single module starts with GHC.
! A global consensus (across all packages) seems elusive which is why I'm trying to narrow the question to a single package, a package that is the CLC's primary focus, in the hope of a definite view emerging.
Yes, I understand that. I'm pointing out that using GHC.
to mean "internal, unstable" in one package but not in another seems liable to cause much confusion.
Changing primops, exported from ghc-prim, does not currently mandate a CLC vote. That is fine.
Once again I ask: what is the criterion you are using here. ... I think your implicit criterion may be:
* **The stable API of base is defined to consist of the exports of all modules _other than_ those with a `GHC.` prefix.**
No, your hypothesis has nothing to do with the judgement quoted (and is wrong in general). I explicitly mentioned the premise: dataToTag#
is a primop, its definition is a part of compiler. The historical precedent is that GHC developers just change them as it seems fit (e. g., sized primitive types), because this is often mandated by internal GHC matters, requiring swift response. (This situation might be suboptimal, but let's leave the further discussion for another day).
However, the change you propose does not just changes a primop, it also introduces a type class. Given the leaking nature of Haskell type classes, this is a visible change for clients well outside of GHC internals. That's why I believe your proposal falls under purview of CLC.
I hope it clarifies. @simonpj @tomjaguarpaw please raise another issue, if you wish to continue a meta discussion, let's not digress wildly.
I hope it clarifies.
Sadlly, it does not. I will open a fresh issue.
Back to the proposal, I'm generally sympathetic to the idea, and the breakage seems minimal if any. I'd personally suggest to limit the proposal to class DataToTag
with dataToTag#
only and leave boxed wrappers for another day, as I imagine a lot of fight over the naming, and it would be pity to put this work on hold for a long time.
Once you have a final design and MR for user-facing changes, please run impact assessment and share results.
How visible is class DataToTag
? Does it show up in haddocks? Does it show up in :i
in ghci
?
Once you have a final design and MR for user-facing changes, please run impact assessment and share results.
Although I wouldn't say the user-facing design is finalized, I think the variants that have been discussed will have little or no effect on the extent of breakage. So I prepared a "quick-and-dirty" backport of the patch to ghc-9.2 and pushed it here, in case anyone wants to find out what clc-stackage
can show us. (Even this "quick-and-dirty" backport was a surprising amount of work. A lot has changed in ghc since the 9.2 branch was cut!)
How visible is
class DataToTag
? Does it show up in haddocks? Does it show up in:i
inghci
?
Similar to Typeable
and Coercible
, its built-in instances are created as-needed and do not show up in haddocks or :info
output.
I do not think I am well-positioned to personally perform further impact assessment at this time. If more impact assessment is desired, I politely request that someone else try building clc-stackage with the branch I mentioned in my last comment. (link)
Matthew, can you summarise for the CLC the changes you propose? It all appears above but not in one place. (It's a pity that these issues don't have a single place to change, that we are all looking at.)
I think a summary with
would be a good way to structure it.
As to impact assessment we seem a bit stalled. Matthew doesn't feel able to do it, and I'm not sure if anyone else will volunteer. Matthew, is the difficulty that the impact-asessment instructions don't work; or that you don't have a powerful enough computer; or something else?
Is it possible to make this change by introducing a new type class, with a new member, and either DEPRECATED
ing, or informally deprecating (with some explanation in the Haddocks) the old one? That would avoid the need for an impact assessment.
I don't really think there is anyone affected, the breakage surface is extremely small. I might be able to run impact assessment myself somewhere next week, but no promises.
It's likely to be less churn for the ecosystem to replace dataToTag#
silently with a more restricted version than deprecate and remove it, which will necessarily require patches. I also doubt it is possible to provide both versions at the same time: the very purpose of this is to remove an unsound primop.
@clyring I cannot complete the impact assessment because GHC from https://gitlab.haskell.org/clyring/ghc/-/tree/dataToTag-class-9.2 panics in multiple packages:
Building library for mime-mail-0.5.1..
[1 of 1] Compiling Network.Mail.Mime ( Network/Mail/Mime.hs, dist/build/Network/Mail/Mime.o, dist/build/Network/Mail/Mime.dyn_o )
ghc: panic! (the 'impossible' happened)
(GHC version 9.2.5:
applyTypeToArgs
Expression: dataToTagLarge# @Padding padding_aaAU
Type: forall {l :: Levity} (a :: TYPE ('BoxedRep l)). a -> Int#
Args: [TYPE: Padding, padding_aaAU]
Args': [padding_aaAU]
Call stack:
CallStack (from HasCallStack):
callStackDoc, called at compiler/GHC/Utils/Panic.hs:181:37 in ghc:GHC.Utils.Panic
pprPanic, called at compiler/GHC/Core/Utils.hs:279:17 in ghc:GHC.Core.Utils
Please report this as a GHC bug: https://www.haskell.org/ghc/reportabug
There are also proper type check errors in repa
, but I'm not sure what they have to do with DataToTag
changes:
Building library for repa-3.4.1.5..
[ 1 of 39] Compiling Data.Array.Repa.Eval.Elt ( Data/Array/Repa/Eval/Elt.hs, dist/build/Data/Array/Repa/Eval/Elt.o, dist/build/Data/Array/Repa/Eval/Elt.dyn_o )
Data/Array/Repa/Eval/Elt.hs:129:31: error:
• Couldn't match kind ‘'FloatRep’ with ‘'BoxedRep l11’
When matching types
a11 :: TYPE ('BoxedRep l11)
Float# :: TYPE 'FloatRep
• In the first argument of ‘touch#’, namely ‘f’
In the expression: touch# f state
In the expression:
case touch# f state of state' -> (# state', () #)
|
129 | = IO (\state -> case touch# f state of
|
@Bodigrim Thank you for trying.
I was unable to reproduce the mime-mail-0.5.1
panic using the build of that branch I made while preparing it. The failure message is mystifying. There shouldn't be any way for existing code to make a dataToTagLarge#
call to come into existence without its levity argument... My best guess is that perhaps some stale interface file is poisoning your build. Can you reproduce these panics with a clean cabal store?
I can reproduce the repa-3.4.1.5
failure. It is caused by the backport of ghc!5877 which was a pre-requisite to making dataToTag#
levity-polymorphic. I thought I removed the changes to the types of other primops while backporting that patch (at least modulo specificity of tyvars), but apparently I failed to do so. I will push a fix later today.
Matthew, is the difficulty that the impact-asessment instructions don't work; or that you don't have a powerful enough computer; or something else?
It's a combination of having a wimpy and unreliable machine with having several past experiences of WSL compatibility troubles.
My best guess is that perhaps some stale interface file is poisoning your build. Can you reproduce these panics with a clean cabal store?
I used a fresh machine, which has never built mime-mail-0.5.1
or other packages with the same error before, so unlikely to be a stale interface.
I will push a fix later today.
Ping me once done, I'll give it another try.
@Bodigrim: I have pushed a commit reverting the unrelated primop type changes. I also bumped the version number to 9.2.100 just to make a stale-interfaces issue less likely. (I really hope that's the cause of the mime-mail-0.5.1
problem because I have no clue what else it could be, and still can't reproduce it.)
@clyring this looks better. But could you please ensure that your branch is atop of 9.2.5? It seems to suffer from https://gitlab.haskell.org/ghc/ghc/-/issues/21964. I'm on M2.
The branch does contain the fix for that issue (74ca6191fa0), which is currently still the last commit in the ghc-9.2 branch.
Sigh. I switched to my old, x64 machine and happy to tell that the change does not seem to affect any Stackage packages, which is good.
Here's a quick run-down of the basic user-facing design questions, and how the current implementation answers them: ...
@clyring could you confirm that your comment remains up-to-date with the most recent developments and there were no design changes?
I'm surprised there's no breakage in clc-stackage. Yes, that comment remains up-to-date.
Dear CLC members, let's vote on the proposal to replace
GHC.Exts.dataToTag# :: forall (a :: Type). a -> Int#
with
type DataToTag :: forall {lev :: Levity}. TYPE (BoxedRep lev) -> Constraint
class DataToTag a where
dataToTag# :: a -> Int#
as detailed in https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8912. The motivation for the change is in the first half of https://github.com/haskell/core-libraries-committee/issues/104#issue-1434906614. The final user-facing design is described in https://github.com/haskell/core-libraries-committee/issues/104#issuecomment-1309210219. In theory, as explained in https://github.com/haskell/core-libraries-committee/issues/104#issuecomment-1302351502, this is a breaking change, but practically no package in clc-stackage
project is affected.
@tomjaguarpaw @chessai @emilypi @mixphix @cgibbard
+1 from me.
+1
-1 (would change to +1 if we don't expose DataToTag
from base
)
The net effect of this proposal on CLC-land (i.e. base
) is
getTag
DataToTag
from GHC.Exts
1 seems fine. Although this is an implementation detail which probably have never been exposed from base
(exposing from ghc-prim
seems sufficient) the change doesn't make anything worse, since it seems there are no consumers who used it at the bad type. 2 is not fine. DataToTag
is a GHC implementation detail and therefore should not be exposed from base
. Anyone who wants to use it should import it from ghc-prim
. Granted, we have a lot of implementation details that are already exposed from base
, but let's not add more.
Matthew and I would like to consult you about the API exposed to users of
base
for thedataToTag#
operator.Currently we have a primop
and a strangely named alias in GHC.Base
Of course this is all wrong:
dataToTag#
is way too polymorphic: it can't possibly work for every typedataToTag#
are to data types; but it's a fragile check,.So Matthew is fixing that by introducing
That fixes both things at one blow:
dataToTag#
now has a type class constrained type, so it's not over-polymoprhicSo far so good: it's a change, but a backward-compatible change.
But we'd also like to kill off the strangely named
getTag
while we are about it (with a deprecation cycle). And we propose to defineas wrappers for
dataToTag#
that return a civilised boxed Int. It would be nice to make these levity-polymorphic too, but you can't writebecause there is a levity-polymorphic binder
x
.An alternative would be to put dataToTag into the class like this
and now dataToTag can be levity-monomorphic. But the implementation is significantly more fiddly, because we have to build that dictionary on the fly.
The naming of the unlifted version is up for grabs. I suggested
dataToTag'
by analogy with foldl'