haskell / core-libraries-committee

95 stars 15 forks source link

Design for dataToTag# #104

Closed simonpj closed 1 year ago

simonpj commented 1 year ago

Matthew and I would like to consult you about the API exposed to users of base for the dataToTag# operator.

Currently we have a primop

dataToTag# :: forall (a :: Type). a -> Int#

and a strangely named alias in GHC.Base

getTag :: forall (a :: Type). a -> Int#
getTag x = dataToTag# x

Of course this is all wrong:

So Matthew is fixing that by introducing

type DataToTag :: forall {lev :: Levity}. TYPE (BoxedRep lev) -> Constraint
class DataToTag a where
     dataToTag# :: a -> Int#

That fixes both things at one blow:

So far so good: it's a change, but a backward-compatible change.

But we'd also like to kill off the strangely named getTag while we are about it (with a deprecation cycle). And we propose to define

dataToTag :: forall (a :: TYPE LiftedRep). DataToTag a => a -> Int
dataToTag' :: forall (a :: TYPE UnliftedRep). DataToTag a => a -> Int

as wrappers for dataToTag# that return a civilised boxed Int. It would be nice to make these levity-polymorphic too, but you can't write

dataToTag x = I# (dataToTag# x)

because there is a levity-polymorphic binder x.

An alternative would be to put dataToTag into the class like this

class DataToTag a where
     dataToTag# :: a -> Int#
     dataToTag :: a -> Int

and now dataToTag can be levity-monomorphic. But the implementation is significantly more fiddly, because we have to build that dictionary on the fly.

The naming of the unlifted version is up for grabs. I suggested dataToTag' by analogy with foldl'

simonpj commented 1 year ago

Tom as I understand it, you want to expose

getTag :: DataToTag a => a -> Int
dataToTag# :: DataToTag a => a -> Int#

That is, the same names as before, but with DataToTag class constraint. Is that what you want?

(The inconsistent naming is a wart, but perhpas one that the CLC thinks that the costs of making it consistent exceed the benefits?)

Naming aside though, can you do that without exposing the DataToTag class? Surely I must be able to write

myGet :: DataToTag a => a -> Int
myGet x = getTag x + 1

So I must be able to import DataToTag! It's not a GHC implementation detail: it's a proper type-class constraint on the types that you can all myGet on, isn't it?

tomjaguarpaw commented 1 year ago

I'm saying that both getTag and DataToTag/dataToTag# are GHC implementation details. People who want to use them should import them from ghc-prim. I don't understand why they'd be exposed from base at all, except by historical mistake in the case of getTag.

Since getTag is already exposed by base it seems prudent to reluctantly allow it to continue to be. DataToTag is not already and it shouldn't be. Users should import it from ghc-prim.

Does that help?

simonpj commented 1 year ago

Aha, thank you. That is indeed clearer. Two thoughts.

First.

I'm saying that both getTag and DataToTag/dataToTag# are GHC implementation details.

If that's what the CLC wants then fine. But I think that many people entirely uninterested in GHC's implementation will want getTag to help them build efficient Ord instances. I don't think it's an implementation detail at all.

If the CLC wants to remove it from base (and the CLC's purview), then perhaps DEPRECATEing getTag would be a good way forward. If you don't deprecated it, its type DataToTag a => a -> Int will mention a class that is not available from base which would be ... unusual. I don't know any package that exports a function whose type constructors are not importable.

Second

People who want to use them should import them from ghc-prim.

That might be difficut, because ghc-prim is the stuff we need to build the Integer package(s), which are in turn needed for much of base. So in general it is not necessarily easy to move stuff from base into ghc-prim.

Since the CLC's wish is that the external (stable) API of base is all exposed modules, and non-exposed modules can't be imported, the only way to expose an implementation detail is from another package. That is perhpas a further reason to split base into ghc-base and base, with base being the purview of CLC, and ghc-base being implementation details.

tomjaguarpaw commented 1 year ago

If that's what the CLC wants then fine.

I'm just speaking for myself here. The CLC makes decisions by majority vote. Whether that leads to the CLC "wanting" something, or to coherent long-range policy, is an open question. But it is about the maximum the CLC has the bandwidth for at the moment.

I don't think it's an implementation detail at all.

It exposes the implementation detail that GHC chooses to represent constructors with integers assigned in a particular way, and it works on the assumption that this assignment is stable across compiler invocations and also across compiler versions. It's not the deepest magic ever but it does seem deep magic enough that I'm wary of seeing it exposed from base. "It's already exposed from base": sure, but I don't want to perpetuate that. Avoiding exposing DataToTag seems like a practical way of discouraging people using this function from base!

I don't know any package that exports a function whose type constructors are not importable.

Sure, because normally packages expose functions that people are supposed to use. I'm saying that people should not use dataToTag/getTag from base. I'm also open to marking the existing version DEPRECATED.

People who want to use them should import them from ghc-prim.

That might be difficut, because ghc-prim is the stuff we need to build the Integer package(s), which are in turn needed for much of base. So in general it is not necessarily easy to move stuff from base into ghc-prim.

But your MR already exposes them from ghc-prim doesn't it? I'm not trying to make a suggestion that applies in general; I'm trying to make a suggestion that applies in this case.

That is perhpas a further reason to split base into ghc-base and base, with base being the purview of CLC, and ghc-base being implementation details.

Exposing getTag/dataToTag stuff from a package other than base sounds like a good idea to me. Then, if people were to start using it successfully and it proved sufficiently popular, then I think it would be worth considering exposing it from base. But inclusion in base should lag popular usage, not lead it.

To reiterate, this is my personal opinion, not a coherent policy, certainly not CLC policy. Other CLC members may disagree with what I have written here. To help improve the situation in future, regardless of the outcome of this proposal, I think it would be valuable to try to form a coherent policy for base, GHC, and their relationship by continuing discussions such as https://github.com/haskell/core-libraries-committee/issues/105.

simonpj commented 1 year ago

To help improve the situation in future, regardless of the outcome of this proposal, I think it would be valuable to try to form a coherent policy for base, GHC, and their relationship by continuing discussions such as https://github.com/haskell/core-libraries-committee/issues/105.

I would warmkly welcome such a policy, thank you.

simonpj commented 1 year ago

But your MR already exposes them from ghc-prim doesn't it? I'm not trying to make a suggestion that applies in general; I'm trying to make a suggestion that applies in this case.

OK yes, that is true. Fine: I think Matthew and I would be open to what the CLC wants. But I suggest that the alternatives are:

We'll await a resolution from CLC. Thanks!

clyring commented 1 year ago

The advice at the top of GHC.Magic, the current proposed home of DataToTag (and its method dataToTag#), is:

Use GHC.Exts from the base package instead of importing this module directly.

Most of the other modules in ghc-prim have similar disclaimers. So our current messaging seems to suggest that DataToTag should be re-exported from GHC.Exts. Of course, we can break with that existing messaging, but I think we ought to have a somewhat clear vision of what our policy and messaging in this area should be before doing so.

tomjaguarpaw commented 1 year ago

To continue to expose an un-deprecated getTag, with a type that mentions a class that is not importable from base seems inconsistent.

Yes, I appreciate my point of view on this is unconventional and I don't anticipate others in the CLC agreeing with me.

Bodigrim commented 1 year ago

@chessai @emilypi @cgibbard just a gentle reminder to vote.

chessai commented 1 year ago

+1

emilypi commented 1 year ago

+1

Bodigrim commented 1 year ago

Thanks all, 4 votes in favour are enough to approve.

cgibbard commented 1 year ago

+1

The change in type is certainly an improvement here, as at least as I understand it, one could imagine instances of the DataToTag class being handwritten and nothing being out of the ordinary semantically with the new version of the function, so while I would have been in favour of deprecating and removing the previous getTag, I think the new constrained one is probably fine.

chshersh commented 1 year ago

I'm trying to summarise the state of this proposal as part of my volunteering effort to track the progress of all approved CLC proposals.

Field Value
Authors @simonpj, @clyring
Status not merged
base version Unknown
Merge Request (MR) https://gitlab.haskell.org/ghc/ghc/-/merge_requests/8912
Blocked by ???
CHANGELOG entry present
Migration guide missing

Please, let me know if you find any mistakes 🙂


@clyring could you share a status update on the implementation and what are next steps? Also, please do let CLC know if you need any help, so we can coordinate and prioritise approved proposals accordingly!

clyring commented 1 year ago

The underlying primop has some smelly special-cases and gunk near exprOkForSpeculation and in code generation (STG-to-whatever). I've slowly been working on cleaning that situation up, but doing so is no longer a blocker now that there is consensus about the user-facing design. I can probably remove most of the code generation changes from the patch implementing the class so that it can move forward concurrently.

clyring commented 10 months ago

The merge request !8912 implementing this proposal has finally landed.

These changes will appear in base-4.20.0, shipped with ghc-9.10.