Make `($)` fully representation polymorphic

MangoIV commented 1 year ago

Motivation

At the moment the type of ($) is:

($) :: forall r a (b :: TYPE r). (a -> b) -> a -> b

which means that a is not representation polymorphic. It is indeed not possible to make it fully representation polymorphic with the current implementation, which is: f $ x = f x due to us not knowing what representation x would be, see this note in the ghc user's guide, however, in the case of ($) this is an implementation detail, i.e. it could be implemented as ($) f = f which would make it unnecessary to pass a representation polymorphic argument.

For some more motivation on why this is necessary, please refer to this paste.

Proposal

Like this, we could rewrite the ($) to:

($)
  :: forall
    repa
    repb
    (a :: TYPE repa)
    (b :: TYPE repb)
   . (a -> b)
  -> a
  -> b
($) f = f
{-# INLINE ($) #-}

(Indeed, if we could not write ($) like this, we would even fail to prove (lifted) id in Haskell, which would be really bad)

Note I have done impact assessment on the clc-hackage, built with a patched ghc927 I have opened a PR to GHC

Warning as pointed out by @monoidal, this will make ($) more strict, i.e. seq (undefined $) () == () will turn into seq (undefined $) () = _|_ This means, that this proposal is a breaking change This proposal may also affect arity analysis

Alternatives

As pointed out by participants to the conversation, there are multiple alternatives in different directions

Changes to the proposal

make the type ($) :: forall a. a -> a, this will make the function passed to ($) possibly multiplicity polymorphic, this type might however be hard to understand for beginners
make the type also multiplicity polymorphic, I consider this out of scope of this proposal

Alternatives without the proposal

alternative unlifted-base which doesn't require the change to actual base
redefining ($) locally: for this to be convenient, this will require an alternative Prelude, e.g. with cabal mixins and some conveniences will fall away e.g. the special rule that makes ($) impredicative everywhere
not using ($) but parens: inconvenient

Bodigrim commented 1 year ago

Why not ($) = id?

MangoIV commented 1 year ago

I thought it would be nicer not to rely on the properties of id, e.g. if we wanted to change ($) to be multiplicity polymorphic, we would first have to change id to be. Also, same number of chars, so, we neither win nor lose anything.

phadej commented 1 year ago

Why not ($) = id?

I remember discussion why we don't have ($) :: a -> a; ($) = \x -> x. That would solve plenty of problems (e.g. it becomes multiplicity polymorphic as well, i.e. maps ordinary functions to ordinary functions and linear functions to linear functions). https://github.com/ghc-proposals/ghc-proposals/pull/274#issuecomment-529894345 - And I actually don't buy an argument that having ($) with "restricted" type helps type-inference/errors that much.

MangoIV commented 1 year ago

And I actually don't buy an argument that having ($) with "restricted" type helps type-inference/errors that much.

it might be worth considering that there might be beginners who do not immediately understand how ($) :: a -> a can be specialized to the current type of ($). Given that the implementer of the current ($) neither seemed to have realized this, that doesn't seem to be completely absurd.

phadej commented 1 year ago

it might be worth considering that there might be beginners who do not immediately understand how ($) :: a -> a can be specialized to the current type of

Beginner argument can be used to argue against you as well: ($) type should stay at (be reverted to) forall (a b :: Type). (a -> b) -> (a -> b) type. If you need fancier representation, multiplicity, etc generalization, define your own.

E.g. in haddocks it is currently displayed in its full glory Screenshot from 2023-02-11 19-11-49

Terrible for beginners, and you want to make it even worse!

MangoIV commented 1 year ago

E.g. in haddocks it is currently displayed in its full glory

Right, I’d really consider this a problem in the UI of haddock, tbh, given that :t doesn’t yield the most general type without flags, but I get your point.

dixonary commented 1 year ago

To be honest, the haddocks for ($) are already not great even outside of the displayed type signature. It's flatly wrong and confusing to say that ($) is redundant.

Kleidukos commented 1 year ago

@dixonary please ping me when you open a merge request to improve these docs, I'll review it :)

tomjaguarpaw commented 1 year ago

It's flatly wrong and confusing to say that ($) is redundant

Right, one may as well say any function is redundant, since you can replace it with its body.

treeowl commented 1 year ago

I very much like ($) :: a -> a. One thing to look out for is how this change will affect arity analysis. Today, (f $) is analysed as having an arity of at least 1, even if nothing is known about f. If ($) = id, then (f $) = f. That's sure to be an improvement in many cases, but might it be worse in others?

Kleidukos commented 1 year ago

@dixonary documentation ticket opened at https://gitlab.haskell.org/ghc/ghc/-/issues/22963

Vlix commented 1 year ago

I very much like ($) :: a -> a. One thing to look out for is how this change will affect arity analysis. Today, (f $) is analysed as having an arity of at least 1, even if nothing is known about f. If ($) = id, then (f $) = f. That's sure to be an improvement in many cases, but might it be worse in others?

Would that break something like ($ 5) <$> [(5 +), (5 -), (5 *)]? Or does that just get reduced to (\x -> ($) x 5)?

treeowl commented 1 year ago

@Vlix It wouldn't break anything (aside from potential type inference issues). The question is how it could affect performance. It won't change performance in your example.

hasufell commented 1 year ago

aside from potential type inference issues

Seems like a huge footnote.

Wrt the actual proposal, can we get the input from @goldfirere since he's been involved in that?

I also find there's lack of motivation. The discourse thread links to a theoretical code example.

However, that's not really sufficient motivation, IMO. Do users use existing workarounds (such as re-defining ($)) right now due to this issue? What is the actual value, regardless of "we could have been more general"?

Even if it doesn't break anything and has zero performance impact... it will change the type signature of one of the most used functions in Haskell... again. This shouldn't be something we regularly iterate over.

phadej commented 1 year ago

I run into this issue in a non+theoretical example recently.

The definition of ResultType in ghc was changed to be unlifted type (using UnliftedNewtypes and UnboxedSums), the pattern synonyms made it look almost like the previous definition. However I had to change

-fromParseResult dflags $ parseFile (toFilePath modpath) dflags contents'
+fromParseResult dflags (parseFile (toFilePath modpath) dflags contents')

Yes, there's an easy work-around, but I did wonder why the previous definition just not work out of the box.

This shouldn't be something we regularly iterate over.

That's why change to ($) :: a -> a is tempting!

MangoIV commented 1 year ago

I also find there's lack of motivation. The discourse thread links to a theoretical code example.

@hasufell its not purely theoretic (although I guess I should make a better example), I encountered this issue when I tried to pass a Proxy# instead of a Proxy and use ($) for that. So the motivation is not theoretical but really practical. The reason why I even thought about it is because of something I need not kindchecking.

Wrt changing it to a -> a, I just have to reiterate, it's tempting but I think it wouldn't solve as many issues as @phadej is saying at the cost of being really confusing to newcomers. E.g. it would now allow to pass multiplicity polymorphic functions but it would itself not be multiplicity polymorphic. I'd say my proposal is a lot less intrusive than that. I understand that it's annoying that we'd have to change the implementation of ($) but I think the signature as it is now is a bug, it's inconsistent and it's unnecessary and it can cause unintuitive issues.

Also wrt @phadej's point, consider that we, after all, even have monadic versions of some functions that only need Applicative, I think this falls in the same range - although the simplification would be possible, it might cause confusion so we keep it.

MangoIV commented 1 year ago

I yesterday built GHC 9.4.4 with this change, I've yet to try it out on the clc-hackage though, as it doesn't support ghc 9.4.4 yet (https://github.com/Bodigrim/clc-stackage/issues/6).

phadej commented 1 year ago

even have monadic versions of some functions that only need Applicative

This is not as simple as that.

We keep their names.
liftM2 is defined using >>=, which is different than using <$> and <*>; for some Monads that makes a difference.
mapM is a method of Traversable, as there can be more efficient implementation (e.g. Vector)

OTOH, e.g. forever was generalized to use Applicative instead of Monad, as having >> as something else than *> is probably not a good idea.

($) :: a -> a serves the purpose of $ has low, right-associative binding precedence, so it sometimes allows parentheses to be omitted as well.

Instead of thinking of f $ ... like "f applied to ...", you may think it as id f (...) (which is f (...)) but in infix form to omit parentheses.

MangoIV commented 1 year ago

As this was requested by @hasufell, here is the actual motivation for this and the reasoning summarized. I will attach this to the original post.

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE MagicHash #-}
{-# LANGUAGE TypeFamilies #-}
{-# OPTIONS_GHC -Wall #-}

module Dollar where

import Data.Kind (Type)
import GHC.Exts (Proxy#, TYPE, proxy#)
import Prelude hiding (($))

-- allows @a@ and @b@ to be representation polymorphic
-- doesn't allow the arrow from @a@ to @b@ to be multiplicity
-- polymorphic
($)
  :: forall
    repa
    repb
    (a :: TYPE repa)
    (b :: TYPE repb)
   . (a -> b)
  -> a
  -> b
($) f = f
{-# INLINE ($) #-}

infixr 0 $

-- does allow for a function @a -> b@ to be multiplicity and
-- representation polymorphic, but is itself not multiplicity
-- polymorphic until we have something like @id :: forall m a. a %m -> a@
-- this consideration would go away if we'd implement it like
-- @\x -> x@ (same argument for why to use @($) f = f@ instead of
-- @($) = id@)
dollar :: forall (a :: Type). a -> a
dollar = id
{-# INLINE dollar #-}

infixr 0 `dollar`

-- for a motivation on why this is useful and how I encountered
-- this issue with @($)@, consider the following (more contrived,
-- but less complicated than the original) example

type family TF a

class C a where
  ca :: Proxy# a -> TF a

-- this would err when using the @($)@ provided by Prelude, although
-- it should literally act as  @`id`@
ca' :: forall {proxy} a. C a => proxy a -> TF a
ca' _ = ca $ proxy# @a

https://paste.sr.ht/~mangoiv/6fef7284c59202b63de7f26c962ad605b39f7b7f

MangoIV commented 1 year ago

evil idea to solve @phadej issues - how about additionally proposing to give id an infixr 0?

treeowl commented 1 year ago

@MangoIV some people would prefer an application operator that's infixl 0.

Bodigrim commented 1 year ago

I usually find free-flowing discussions very fruitful, but I know that it can be intimidating / overwhelming for proposers. As a point of order, @MangoIV, you are not obliged to answer every suggestion / generalization / counterproposal.

I yesterday built GHC 9.4.4 with this change, I've yet to try it out on the clc-hackage though, as it doesn't support ghc 9.4.4 yet (https://github.com/Bodigrim/clc-stackage/issues/6).

The latter is not going to change by magic. In the meantime you can backport your changes to ghc-9.2 branch and give it a try.

However, that's not really sufficient motivation, IMO. Do users use existing workarounds (such as re-defining ($)) right now due to this issue? What is the actual value, regardless of "we could have been more general"?

I work a lot with low-level primops and it is very annoying indeed that one cannot use ($) with Int#, Word#, etc. I can vouch that the pain is real.

treeowl commented 1 year ago

@Bodigrim , how much pain would this change alleviate? Wouldn't (.) still fail to work with those things?

hasufell commented 1 year ago

I work a lot with low-level primops and it is very annoying indeed that one cannot use ($) with Int#, Word#, etc. I can vouch that the pain is real.

What do you usually do then? Use parentheses or locally define an operator?

Bodigrim commented 1 year ago

What do you usually do then? Use parentheses or locally define an operator?

I was too stupid to realise that I can redefine ($) locally in a representation-polymorphic way, so I used parentheses.

davean commented 1 year ago

I often have to code around this issue also. Sadly most of the Haskell 'base' library doesn't work well with this stuff, so my code is often very custom.

I don't think you'll see much code that can use this in the wild until you fix the problems because it makes it very expensive to write in Haskell and end up as a custom parallel micro-ecosystem in my experience.

Bodigrim commented 1 year ago

@MangoIV how would you like to proceed? Please raise an MR with your preferred version of ($) and complete impact assessment to learn if any packages are broken by increased polymorphism.

MangoIV commented 1 year ago

Hi, I'd like to make an impact assessment, but should I build that against ghc 927 or am I blocked until clc stackage has been upgraded to 944 or 961 respectively? If so, I guess I'd volunteer helping to fix it.

Bodigrim commented 1 year ago

You can use GHC 9.2.7, I would not expect a material difference for this proposal.

MangoIV commented 1 year ago

I guess this is the "success" screen:

[1 of 1] Compiling Lib              ( src/Lib.hs, /home/mangoiv/Devel/clc-stackage/dist-newstyle/build/x86_64-linux/ghc-9.2.7/clc-stackage-0.1.0.0/build/Lib.o, /home/mangoiv/Devel/clc-stackage/dist-newstyle/build/x86_64-linux/ghc-9.2.7/clc-stackage-0.1.0.0/build/Lib.dyn_o )
gcc: fatal error: cannot execute ‘/nix/store/qm03mh5x2j4ls0y80y8h3zsyj4hb52w2-gcc-11.3.0/libexec/gcc/x86_64-unknown-linux-gnu/11.3.0/collect2’: execv: Argument list too long
compilation terminated.
`cc' failed in phase `Linker'. (Exit code: 1)

How do I proceed?

Bodigrim commented 1 year ago

Yes, that's a "success" :)

Could you please raise a GHC MR with your changes?

chessai commented 1 year ago

I often have to code around this issue also. Sadly most of the Haskell 'base' library doesn't work well with this stuff, so my code is often very custom.

I don't think you'll see much code that can use this in the wild until you fix the problems because it makes it very expensive to write in Haskell and end up as a custom parallel micro-ecosystem in my experience.

Seconded. I've felt this pain very much.

MangoIV commented 1 year ago

I have opened an MR as requested

Bodigrim commented 1 year ago

Dear CLC members. We have all prerequisites for the vote: the proposal, the motivational example, the impact assessment (all green), and the MR. There are also multiple testimonials in this thread from potential users.

Let's keep the floor open until the end of the week and use it as an opportunity to polish the proposal if there are any more questions / suggestions.

monoidal commented 1 year ago

This is making ($) more strict: seq (($) undefined) () will now be _|_ instead of (). This can potentially break existing code but I don't know if it actually does.

MangoIV commented 1 year ago

@monoidal thank you for the note; I built the clc stackage against this change and it didn't report any failure so at least none of the libraries in there rely on this. Can you elaborate why this makes it more strict? I don't think I understand it, tbh, is it generally more strict or is it due to how undefined gets typed in your example (it's itself representation polymorphic afaiu)?

monoidal commented 1 year ago

If you have

f :: Int -> Int
f = f

then seq (newdollar f) () = seq f () = _|_ while seq (olddollar f) = seq (\x -> f x) () = (). Eta expansion changes strictness. This is not related to representation polymorphism of undefined.

Granted, it makes little sense to write ($) f or (f $) since you can just write f. But this is a proposal changing strictness of one of the most used functions, that has been there for more than 20 years. I am not saying we can't do that, but the bar should be high. The fact that nobody has pointed out this fact for a month makes me uneasy.

Also note that the proposal affects VTA, although I'm less concerned about this.

Kleidukos commented 1 year ago

hmm, could we perhaps introduce this change for ($!) then, if the change in strictness definitely affects the runtime semantics of the programs?

monoidal commented 1 year ago

($!) forces its second argument, which means it cannot be made representation polymorphic.

Kleidukos commented 1 year ago

Ah curse. thanks!

MangoIV commented 1 year ago

Eta expansion changes strictness. This is not related to representation polymorphism of undefined.

Right. Now that you say it it's obvious and I feel dumb. Thank you for noticing. This changes it quite a bit. I wonder now why nothing breaks. Are all test suites run when building the clc stackage as prescribed?

MangoIV commented 1 year ago

@monoidal perhaps you know; if this were to break nothing, would it improve performance?

chshersh commented 1 year ago

TIL: I was today-years-old when I learned that you can't define levity-polymorphic variables in Haskell 😮‍💨

Another potential breaking change if someone does something weird with inferring the type of ($) during code generation, and this code does no longer work because the AST of ($) has changed.

Still, I think the benefits of making ($) levity-polymorphic outweigh the drawbacks of supporting one weird use-case nobody probably uses (and unless someone writes a good example of why you may want to write ($) undefined, I'm not convinced it's worth it to support backwards compatibility in this edge case).

So as long as this is documented, I'm +1 on this proposal.

The ($) operator always had a tragic story when it looks like a syntactic sugar that helps removing extra (), in practice it requires special handling in multiple several places (ApplicativeDo, HLS, maybe Haddock, now this levity-polymorphism). I appreciate any proposal that helps dropping f $ x instead of f (x) and make it as invisible to the end user as possible.

On a separate note, the linked GHC User Guide page make it sound like it's impossible to define ($) fully levity-polymorphic, however, as this proposal shows, it is indeed possible with a minor change of semantics. So I would encourage to update the GHC User Guide regardless.

hasufell commented 1 year ago

I'd still like a comment from @goldfirere

jappeace commented 1 year ago

I'd say it's a pretty major concern because it may break programs silently. You could introduce a bug to someone's program after an upgrade and they get no automated warning for that before deploying.

tomjaguarpaw commented 1 year ago

I'm fairly negative on this proposal because I think interested parties should try it in a putative unlifted-base package first, but I'm having trouble thinking of a realistic example where seq (($) f) _ == _|_ is a problem. Does anyone have one?

Bodigrim commented 1 year ago

This is making ($) more strict: seq (($) undefined) () will now be _|_ instead of (). This can potentially break existing code but I don't know if it actually does.

Thanks for pointing this out. It's technically true, but it's just that eta-expansion and eta-reduction change laziness:

> undefined `seq` ()
*** Exception: Prelude.undefined
> (\x -> undefined x) `seq` ()
()

If anyone is at fault here, it is seq, not $.

As a funny manifestation of the similar phenomenon, the following program's output depends on -O:

import GHC.Exts (build)
main = print $ foldr undefined () (build seq) `seq` ()

$ ghc -v0 Lazy.hs && ./Lazy
()
$ ghc -v0 -O Lazy.hs && ./Lazy
Lazy: Prelude.undefined

I'm fairly negative on this proposal because I think interested parties should try it in a putative unlifted-base package first

While recent GHCs extended the variety of unlifted data with {-# LANGUAGE UnliftedDatatypes, UnliftedNewtypes #-}, primitive unlifted types such as Int# are an integral part of base since inception.

hasufell commented 1 year ago

Thanks for pointing this out. It's technically true, but it's just that eta-expansion and eta-reduction change laziness

Sure, but that still seems like a huge caveat for one of the most used functions in Haskell.

I do not feel comfortable changing this behavior.

Bodigrim commented 1 year ago

Can someone demonstrate at least relatively realistic example when this "huge caveat" matters? Do you ever seq functions? Or supply literal undefined as the first argument of $?

parsonsmatt commented 1 year ago

OTOH, e.g. forever was generalized to use Applicative instead of Monad, as having >> as something else than *> is probably not a good idea.

This did result in a space leak in transformers due to (*>) having a default definition of liftA2 (flip const) which would hold on to the value of the first computation until the second had completed - so sometimes trivial improvements like that can have surprising negative effects.

I am +1 on this proposal, including making ($) :: a -> a. Types are a good starting point for documentation, but if the docs for $ are confusing, then we should improve them independently of this - and, as has already been pointed out, it is confusing.

I cannot imagine a scenario where the laziness of $ would matter. $ is almost more of a syntactic construct than a regular function - it's sole purpose is to avoid writing parentheses. If your code somehow relies on f $ g $ x having different laziness behavior than f (g x) then I want to hear about it!

haskell / core-libraries-committee