Closed nomeata closed 3 years ago
I'm -1 on this change. I've >10 years experience teaching Haskell and its never been an issue with teaching. On the contrary Eq is a simpler case studies to introduce laws, rather than wait until students encounter Applicative etc I've sometimes found it simpler to think terms of dfining /=, I've never worried about performance from this issue, ther eare usually lower hanging fruits elsewhere.
I'm +1. This is a fantastic simplifying change. There's no reason not to do this, other than extremely trivial breakages. And I would really like to live in a world where we don't let the burden of already-written code get in the way of turning Haskell into the language we want it to be.
this breaks code for no reason. Lets not touch this till we have better support for warning about redundantly defined type class methods such as having a "XOR" warning minimal pragma change such as I suggest in https://mail.haskell.org/pipermail/libraries/2021-October/031490.html
I'm +1 on this. x /= y = not (x == y)
can and should be defined outside of the class.
As it stands, Haskell's type class system does not have a framework for "class laws": at best, they are guidelines for how types fitting that class should behave. There is no way within a class' definition to guarantee that only those types with lawful implementations should be bestowed the class. The burden lies on the library designer to help guide their users away from footguns; unlawful instances are some of those. By removing (/=)
from the class definition and forcing users only to write one method, the opportunity for bugs to arise diminishes and they are easier to spot when they do.
Also, who really writes Eq
instances by hand in production code, anyway? Be honest. The vast, vast majority of users derive Eq
for the datatypes they define. And by making Eq
a single-method typeclass, this deriving gets even easier. You still get (/=)
for free, it just so happens it's not in the class definition.
Thanks for raising this issue, @nomeata!
To facilitate internal CLC discussions, I’d like to gather some additional input:
Eq
? Are there arguments to prefer two today?postgresql-simple
, @paul-rouse for mysql-simple
, @treeowl for containers
.Does this change affect educational materials or text books? Could anyone bring specific examples?
Concrete example: In a tutorial of my own I explain the concept of default methods when I introduce the Eq
class. This section would change, and default implementations only explained later when they first come up. (Unsurprisingly,) I would welcome that change; it would allow me introduce default methods with a type class where they make more sense (e.g. Ord
).
1 If we were to design Haskell from scratch today, would we prefer one or two members of Eq? Are there arguments to prefer two today?
I'd probably flip a coin
/=
in the type-class https://pursuit.purescript.org/packages/purescript-prelude/3.1.0/docs/Data.Eq 2 Does this change affect educational materials or text books? Could anyone bring specific examples?
Hutton's Programmin gin Haskell says in 3.9 basic classes section
Eq - equality types
... using following two methods
(==) :: a -> a -> Bool
(/=) :: a -> a -> Bool
However, the presentation in that chapter just tells these functions exist, it doesn't yet speak about defining own instances.
I don't think that would trap future readers of an old edition book who use newer GHC
3 Are proponents of the change willing to facilitate ecosystem’s migration by raising PRs to affected libraries?
See below. Though I'm not proponent nor opponent of this change, I just want to dismiss the narrative that a lot stuff would break. I didn't believe it. Judge yourself.
Are maintainers willing to accept such PRs?
I'd accept a PR, though I'd just do it myself during the usual new-GHC-support
work. (In my case, to have a comparison example, removal of Option
generates/d more
work).
I built Stackage LTS-18.8 set (which I had around for other experiment). Out of ~2600 packages (snapshot is slightly larger, but I left out some packages needing native dependencies etc.) the following ~45 needed some changes:
packages: attoparsec-0.13.2.5/
packages: backprop-0.2.6.4/
packages: basement-0.0.12/
packages: basic-prelude-0.7.0/
packages: butcher-1.3.3.2/
packages: clash-lib-1.4.3/
packages: clash-prelude-1.4.3/
packages: clientsession-0.9.1.2/
packages: compdata-0.12.1/
packages: compensated-0.8.3/
packages: conduit-1.3.4.1/
packages: DBFunctor-0.1.2.1/
packages: dimensional-1.4/
packages: eventstore-1.4.1/
packages: fb-2.1.1/
packages: geojson-4.0.2/
packages: ghc-lib-parser-8.10.7.20210828/
packages: haskell-gi-base-0.25.0/
packages: HaskellNet-0.6/
packages: hOpenPGP-2.9.5/
packages: hybrid-vectors-0.2.2/
packages: intro-0.9.0.0/
packages: io-streams-1.5.2.1/
packages: loc-0.1.3.10/
packages: mixed-types-num-0.5.9.1/
packages: multistate-0.8.0.3/
packages: numbers-3000.2.0.2/
packages: numeric-prelude-0.4.3.3/
packages: postgresql-libpq-0.9.4.3/
packages: postgresql-simple-0.6.4/
packages: prelude-compat-0.0.0.2/
packages: primitive-addr-0.1.0.2/
packages: protolude-0.3.0/
packages: relude-0.7.0.0/
packages: sbv-8.15/
packages: semirings-0.6/
packages: singletons-2.7/
packages: snap-core-1.0.4.2/
packages: sqlite-simple-0.4.18.0/
packages: string-class-0.1.7.0/
packages: tfp-1.0.2/
packages: type-level-natural-number-2.0/
packages: type-natural-1.1.0.0/
packages: unboxing-vector-0.2.0.0/
packages: uncertain-0.3.1.0/
packages: unification-fd-0.11.1/
packages: vector-0.12.3.0/
(and mysql-simple
which I didn't build).
Most changes are oneliners. Either removing explicit /=
definition,
or adding /=
import.
I can upload patches somewhere, if anyone is interested to take a closer look
at them. (I won't submit them upstream, though some are probably good even
without the change going in, i.e. removing unnecessary definitions of /=
).
Few packages are more interesting:
clash-prelude
This is interesting application. I'd ask if the change makes things less convenient.
The /=
is defined to be some "primitive" (which is NOINLINE
, and in future
OPAQUE
I suppose) for which Clash looks in the Core. That's my
understanding.
alternative or special purpose preludes:
protolude
, basic-prelude
, haskell-gi-base
, relude
needed an amendment to keep the re-export list intact.
Similarly basement
, though it also needed a fix in itself.
GHC (like) code: clash-lib
and ghc-lib-parser
, "I have seen this code".
complicated /= example
https://hackage.haskell.org/package/numbers
package illustrates the "are ==
and /=
compatible, or do they differ?"
problem.
data Interval a = I a a
ival :: (Ord a) => a -> a -> Interval a
ival l h | l <= h = I l h
| otherwise = error "Interval.ival: low > high"
instance (Ord a) => Eq (Interval a) where
I l h == I l' h' = l == h' && h == l'
I l h /= I l' h' = h < l' || h' < l
https://hackage.haskell.org/package/DBFunctor package has:
-- | Definition of the Relational Data Type. This is the data type of the values stored in each 'RTable'.
-- This is a strict data type, meaning whenever we evaluate a value of type 'RDataType',
-- there must be also evaluated all the fields it contains.
data RDataType =
RInt { rint :: !Integer }
| RText { rtext :: !T.Text }
| RUTCTime { rutct :: !UTCTime}
| RDate {
rdate :: !T.Text
,dtformat :: !Text -- ^ e.g., "DD\/MM\/YYYY"
}
| RTime { rtime :: !RTimestamp }
| RDouble { rdouble :: !Double }
-- RFloat { rfloat :: Float }
| Null
-- | We need to explicitly specify equation of RDataType due to SQL NULL logic (i.e., anything compared to NULL returns false):
-- @
-- Null == _ = False,
-- _ == Null = False,
-- Null /= _ = False,
-- _ /= Null = False.
-- @
-- IMPORTANT NOTE:
-- Of course this means that anywhere in your code where you have something like this:
-- @
-- x == Null or x /= Null,
-- @
-- will always return False and thus it is futile to do this comparison.
-- You have to use the is 'isNull' function instead.
--
instance Eq RDataType where
RInt i1 == RInt i2 = i1 == i2
-- RInt i == _ = False
RText t1 == RText t2 = t1 == t2
-- RText t1 == _ = False
RDate t1 s1 == RDate t2 s2 = toRTimestamp (unpack s1) (unpack t1) == toRTimestamp (unpack s2) (unpack t2) -- (t1 == t1) && (s1 == s2)
-- RDate t1 s1 == _ = False
RTime t1 == RTime t2 = t1 == t2
-- RTime t1 == _ = False
RDouble d1 == RDouble d2 = d1 == d2
-- RDouble d1 == _ = False
-- Watch out: NULL logic (anything compared to NULL returns false)
Null == Null = False
_ == Null = False
Null == _ = False
-- anything else is just False
_ == _ = False
Null /= Null = False
_ /= Null = False
Null /= _ = False
x /= y = not (x == y)
I.e. it's a DSL trying to emulate SQL-like processing in Haskell.
Not able to do weird thing SQL does with NULl
equality would not be nice.
(a lot more inconvenient then for postgresql-simple
, where I actually have
no idea what Eq Null
instance is useful for).
However, when removing the /=
none of the tests of DBFunctor
failed.
It would be good to get away from the blatantly unlawful instance in mysql-simple
! This predates my time as maintainer, so I can only assume the idea was along the lines of the comment in DBFunctor
mentioned above. However, it is a poor reflection of modern SQL semantics, and could be very confusing. I would probably just remove the Eq
instance on Null
, but it is, unfortunately, exposed to users of the library. I doubt if anybody actually uses it, and I've checked that persistent-mysql
doesn't, but it's possible that something somewhere would break (I haven't yet tried compiling all of stackage against this change).
Exactly the same code appears in postgresql-simple
, so I assume the same applies there.
However, it is a poor reflection of modern SQL semantics
The poorness there is try to model 3VL with two-valued Bool
. But I think 3VL way of presenting the NULL comparisons is not particularly new.
https://en.wikipedia.org/wiki/Null_(SQL)#Comparisons_with_NULL_and_the_three-valued_logic_(3VL)
In terms of performance I would expect for real world code:
So from a performance PoV, if we were to design it today there would be a reason to have it be a single method. But it's unclear if it's worth breaking packages now for the small benefit it provides.
Am 27/10/2021 um 19:45 schrieb Bodigrim:
To facilitate internal CLC discussions, I’d like to gather some additional input:
- If we were to design Haskell from scratch today, would we prefer one or two members of |Eq|? Are there arguments to prefer two today?
- Does this change affect educational materials or text books? Could anyone bring specific examples?
- Are proponents of the change willing to facilitate ecosystem’s migration by raising PRs to affected libraries? Are maintainers willing to accept such PRs? CC @phadej https://github.com/phadej for |postgresql-simple|, @paul-rouse https://github.com/paul-rouse for |mysql-simple|, @treeowl https://github.com/treeowl for |containers|.
- What is an impact of this change on GHC and compilation times? CC @AndreasPK https://github.com/AndreasPK.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haskell/core-libraries-committee/issues/3#issuecomment-953165344, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJ65ZWUO6ZWCLDJSM4IMD3UJBCGJANCNFSM5GZ2YRQQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
CC @christiaanb for clash-prelude
, @nkarag for DBFunctor
.
Using type class backends we could transparently evolve Eq
to BackendEq
with only one method
Using type class backends we could transparently evolve Eq to BackendEq with only one method
Something like Eq
should be a simple as can be imaginable. Original motivation says:
Teaching: The Eq class is often the first, or one of the first, type classes that beginners will face.
I find it hard to form a particularly strong opinion on this either way. It seems like a fairly minor breakage for very minor wins. Some thoughts:
I don't weight teaching very heavily in decision-making about Prelude
. I think most educational material would be better served defining a custom prelude, with less polymorphic types, fewer and simpler classes, etc. I'm also perfectly happy to tell students useful lies, for example, I could just present Eq
as having one method, and students writing their own instances would be able to do so happily without knowing (/=)
is a class method, or even exists!
I'm somewhat sympathetic to the argument that a single method makes law-breaking instances harder, but Eq
is rather weak in its laws; in deference to the practically defunct Haskell Report, the Eq
laws are explicitly merely "encouraged" (if you'll allow me to side-step Melanie's point that all laws in Haskell are effectively suggestions anyway). I would be more sympathetic to this change if we took a firmer stance on Eq
s lawfulness.
Inequality checks are available as an instruction for some types, aren't they? Wouldn't that mean some types will have worse performance with this change (an unnecessary function call to not
vs. a single instruction). I confess I'm not very knowledgeable at that level of abstraction.
Right now I'm a very weak -1 on this.
As a Haskell user I probably wouldn’t notice this (I always derive Eq
), but back when I was learning Haskell I thought it was pretty neat that the language doesn’t take a stance whether ==
or /=
is the fundamental operation. It’s a nice way to introduce the notion of minimal complete definition (i.e. define a method of your choice and get the other one for free).
Really, why is your proposal to remove /=
and keep ==
rather than to remove ==
and keep /=
? What makes one more fundamental than the other? I used to find it quite elegant that we don’t have to make these arbitrary choices (nowadays I don’t care).
To summarise, I’m leaning slightly against this proposal, but wouldn’t lose my sleep over it.
Does the same reasoning apply to (<$)
? It is currently defined in the Functor
class. If we remove (/=)
from Eq
, should we remove (<$)
from Functor
too?
Nope. As the docs to that say: " The default definition is fmap . const, but this may be overridden with a more efficient version." The argument for removing /=
is that because negation of booleans is cheap, there is no more efficient version.
When we add a type class member, which can be expressed via others, there is only one possible excuse to do so: one can provide a more efficient implementation than default. Since this is not the case for (/=)
, its presence in Eq
just opens a window for incoherence without any benefit. Neither (==)
nor (/=)
is more fundamental than another, but having them both is asking for troubles.
Given that the migration path is straightforward and backward compatible and that maintainers of SQL libraries above are on board with cleaning infelicities in instance Eq Null
, I'm +1 on this change.
In some settings /= is a primitive operation and == is derived from it.
This is speculation. @nomeata's test indicate that it's not an issue. (Compiling Cabal, see the mailing post https://mail.haskell.org/pipermail/libraries/2021-October/031484.html). @nomeata have you already run nofib
too?
Many people with good judgement believe there is some merit to this proposal. I confess I cannot see what they see. I am very much against churn for the sake of minor improvement. The Haskell community already has a reputation of insufficiently valuing stability.
On the other hand, perhaps this change will have minimal impact. Indeed the only way I could countenance it is if there is no impact on maintainers. That seems to imply that someone must take responsibility for ensuring that the entirety of Hackage builds with this change before the change is actually applied. If that happens then I won't oppose the proposal (not that my voice carries any weight) but I still can't understand the motivation.
@nomeata have you already run
nofib
too?
No, I did not. I shy away from performance measurements because it's so hard to get them meaningfully right, at least for runtime changes in sub percentage space.
@nomeata check nofib
perf
options. These are good as they don't count wall clock:
Usage: nofib-run [-c|--clean] [TEST] [-j|--threads N] [-w|--compiler HC]
[--compiler-arg ARG] (-o|--output DIR) [--cachegrind]
[--cachegrind-arg ARG] [--perf] [[--perf-arg ARG]]
[-s|--speed ARG] [--rts-arg ARG] [-t|--times ARG]
[--keep-going] [--head] [-V|-v|--verbosity ARG]
nofib Benchmark Suite
Available options:
-h,--help Show this help text
...
--perf Run the tests under `perf stat`
I've seen a number of proposals on and off like this recently. I think the reason is likely that there's a an energy to "fixing" Haskell, which I think is a great symptom and energy in the community. However, I advocate against this change, not because the change is bad in isolation, but because accepting this proposal would be symptomatic of a poor and inconsiderate design process, and a lack of unifying philosophy in the CLC.
My understanding is that accepting such a proposal would mean that the CLC is saying "we will accept any improvement to the Haskell core libraries that has a minor cost and is an improvement on the past situation."
For example, we can take a similar proposal here from @isovector which aimed to export liftA2
and join
from Prelude
which is a seemingly innocent change, just like this proposal from @nomeata. That proposal seems to satisfy a similar place to this Eq
proposal, namely it is small and not-too-breaking.
But let's assume we accept this change. Now I come along and look at Ord, and I say "huh", why include anything other than <=
? I won't argue for or against the content of such a proposal... But note now that we have 3 breaking changes, each come up with "on a whim" (which is not bad), and 3 breaking changes to Prelude
that seem:
Eq
proposal which attempts to prevent unlawful Eq
does not do so for `Ord... Why is that?)base
releasesNote also that the change to Eq
might cause a worse situation... What if the breakage caused by the change to Ord
is actually too substantial (I personally once had a unlawful Ord
instance) and we decide that that typeclass cannot be broken. Now we have one typeclass that is "right" and one typeclass that is "wrong". That situation is worse than today! Next year we'll have someone propose including /=
in Eq
;)
The point being, we shouldn't accept that these proposals would each require more files to have CPP
code in them without really auditing base
and doing such a change in one fell swoop. Plus we should really consider developing a migration tool for Haskell (and non-text-based conditional compilation), because these problems will keep coming up and library maintainers will eventually be unable to use linters if we don't give them the attention they deserve.
In sum, such a change cannot and should not be an isolated change. The real proposal as I see it is "Remove default
typeclass methods from typeclasses where their implementation can upset the class's laws". That is a very reasonable proposal, whose actual costs we can discuss fully by inspecting all unlawful classes and a subsequent migration strategy. That way, we can also talk about three or four "principles" being fixed in base
in a major version instead of a hodge-podge of unrelated, but each seemingly reasonable, whack-a-moles updates!
@santiweight's comment is in line with my recent thoughts here. There are lots of really good changes we should make to base, all of which keep getting shot down for exactly the same reason of "too much churn." But that doesn't mean we shouldn't do them! It just means we should pick some particular time/release and put all of our big breaking changes into that basket.
ofc a better solution would be to unprivilege base
so that it's unrelated to GHC versions, and then just use semver for it like we do with literally every other package.
In sum, such a change cannot and should be an isolated change. The real proposal as I see it is "Remove default typeclass methods from typeclasses where their implementation can upset the class's laws".
But *it is not**. The proposal says
(The Ord class is much better in that regard, for example (<=) can be implemented more efficiently than compare.)
And as other have pointed out, it's difficult to figure out where /=
is more efficient then ==
.
The examples brought by @effectfully are all with a disclaimer: "(I don't really use the Eq class there)".
and then just use semver for it like we do with literally every other package.
PVP is semantic versioning scheme. There are problems:
base
(and template-haskell
) are the only ways to imply GHC version support in cabal
files. There are changes which are not library ones, (like recent changes in how RankNTypes
work, simplified subsumption).base
is huge API surface, in 6 months it almost by necessity gets some breaking change in a corner likeAdd a Typeable constraint to fromStaticPtr in the class GHC.StaticPtr.IsStatic.
Otherwise I agree, base
which is decoupled from GHC release cycle would be great, but the infrastructure is not there yet. It cannot be done any time soon.
It would greatly reduce the compatibility burden. it would be enough to make a version of base
to be compatible with 3 latest GHCs, and not worry about that the downstream code can be written against three different base
s.
But that all just wishes. If you don't like the churn, please push for solving the underlying problems, not blocking changes in the mean time.
@phadej I don't I made my point totally clear in that regard, the point being that the removing <=
or compare
would each have their arguments, and the fact that the removal of one or the other could be a reasonable proposal means that we will always be catching up (addressing new proposals etc.) if we don't initially decide on what principle guides the design of such proposals, so we can easily say "this proposal fails on this principle" for example. As it stands, each time a proposal comes out we start from 0 and the discussion is repeating itself.
I would like some large scale coordination on how to make decoupled base a reality. Otherwise we get these massive threads with no easy answers and the CLC can only easily do things at the margins.
I think we can do it today with CPP, and then we will want to invest in things like better orphans and better backpack to make the interfaces cleaner.
But that all just wishes. If you don't like the churn, please push for solving the underlying problems, not blocking changes in the mean time.
I am pushing for the underlying problems! I am currently trying to come up with a non-text-based pre-processor/conditional-compilation extension that would allow a migration tool to be written for changes to namespace etc. I posted in the slack channel yesterday in fact and (once some current personal life stuff is done) I will try to get some implementation over Christmas. But the point also stands that this proposal's acceptance is a worrying and a _realworld slippery slope (slippery slopes are valid arguments if they happen), and I am personally not trying to "block changes" but instead to protect users, which is, after-all the whole point of the core libraries. The core principle here is not "make states unrepresentable, but make users happy and productive - the second principle implies the first only some of the time, for the reasons outlined above and in the proposal
@santiweight
I don't I made my point totally clear in that regard, the point being that the removing <= or compare would each have their arguments,
No they won't. compare
and <=
has more efficient implementations. And GHC derives them:
{-# OPTIONS -ddump-deriv #-}
module MyMaybe where
data MyMaybe a = MyNothing | MyJust a
deriving ( Eq, Ord )
==================== Derived instances ====================
Derived class instances:
instance GHC.Classes.Eq a =>
GHC.Classes.Eq (MyMaybe.MyMaybe a) where
(GHC.Classes.==) (MyMaybe.MyNothing) (MyMaybe.MyNothing)
= GHC.Types.True
(GHC.Classes.==) (MyMaybe.MyJust a1_a2cD) (MyMaybe.MyJust b1_a2cE)
= ((a1_a2cD GHC.Classes.== b1_a2cE))
(GHC.Classes.==) _ _ = GHC.Types.False
instance GHC.Classes.Ord a =>
GHC.Classes.Ord (MyMaybe.MyMaybe a) where
GHC.Classes.compare a_a2cF b_a2cG
= case a_a2cF of
MyMaybe.MyNothing
-> case b_a2cG of
MyMaybe.MyNothing -> GHC.Types.EQ
_ -> GHC.Types.LT
MyMaybe.MyJust a1_a2cH
-> case b_a2cG of
MyMaybe.MyJust b1_a2cI -> (a1_a2cH `GHC.Classes.compare` b1_a2cI)
_ -> GHC.Types.GT
(GHC.Classes.<) a_a2cJ b_a2cK
= case a_a2cJ of
MyMaybe.MyNothing
-> case b_a2cK of
MyMaybe.MyNothing -> GHC.Types.False
_ -> GHC.Types.True
MyMaybe.MyJust a1_a2cL
-> case b_a2cK of
MyMaybe.MyJust b1_a2cM -> (a1_a2cL GHC.Classes.< b1_a2cM)
_ -> GHC.Types.False
(GHC.Classes.<=) a_a2cN b_a2cO
= GHC.Classes.not ((GHC.Classes.<) b_a2cO a_a2cN)
(GHC.Classes.>) a_a2cP b_a2cQ = (GHC.Classes.<) b_a2cQ a_a2cP
(GHC.Classes.>=) a_a2cR b_a2cS
= GHC.Classes.not ((GHC.Classes.<) a_a2cR b_a2cS)
Note: there is no /=
definition. Default is used. There is both compare
and <
is plain wrong as in certain rare cases there's no sensible way of expressing
/=
via==
at all, while it works perfectly well the other way around.
Please write down your example, so it's not a speculative "in some rare cases I almost run in private codebase". Provide a minimal example.
I don't understand your DSL-ish like example
If I was not to compile the expression right after turning a constructor representing == into a constructor representing /= and instead used an intermediate representation,
How you get there, ==
has to return Bool
, not some AST node type?
How you get there,
==
has to returnBool
, not someAST
node type?
Right, I got myself confused. Thank you for catching that. I dismiss my nonsensical points. I'll delete the noise.
@phadej I appreciate the feedback. I don't want to deny the fact that clearly Ord
is different than Eq
as you presented. Thank you for explaining that to me
I don't want us to accidentally hijack the thread, but I can equally go ahead and bring up the point for other typeclasses. How about Enum
- let's kill succ
pred
etc. How about Monad
of no return - that is the same issue as this current proposal just that the cost is higher. Is the argument purely one of efficiency? Why then is even : Integral a => a -> Bool
not defined in the type class, which has a faster implementation for binary representation than for other representations? Why is (*>) :: f a -> f b -> f b
in Applicative
instead of defined outside of the typeclass? I don't know that anyone has ever defined their own version of *>
...
Anyway, the fact that we are having this conversation, and I am not being linked to some documentation saying "This is when unlawfullable typeclasses are allowed" is the evidence I need to suggest that we have an underlying problem that we will all continually waste time on if we can't cache all the points and criticisms in one place.
I don't know that anyone has ever defined their own version of
*>
...
This is becoming very tangential now, but they are treading on dangerous ground if they don't. See forever contains a space leak.
@tomjaguarpaw I definitely agree, let's summarise my aspect of the feedback (besides thinking that the migration is more costly than I see acknowledged), as this:
Anyway, the fact that we are having this conversation, and I am not being linked to some documentation saying "This is when unlawfullable typeclasses are allowed" [is the real issue here]
I think the CLC could come up some such principles and guidelines so that this discussion can focus on a more narrow part of the massive design-space that is base
.
about Enum - let's kill succ pred etc.
Enum
is bad class, but
Prelude> succ (2 ^ 128 :: Integer)
340282366920938463463374607431768211457
Prelude> toEnum (fromEnum (2 ^ 128 :: Integer) + 1) :: Integer
1
How about Monad of no return - that is the same issue as this current proposal just that the cost is higher.
Almost, except return
has to be pure
. (And mappend
has to be <>
).
Why then is even : Integral a => a -> Bool not defined in the type class, which has a faster implementation for binary representation than for other representations?
It could. But there is Bits
class though.
I don't know that anyone has ever defined their own version of *>...
https://hackage.haskell.org/package/tagged-1.8.6.1/docs/src/Data.Tagged.html#line-250
instance Applicative (Tagged s) where
pure = Tagged
{-# INLINE pure #-}
Tagged f <*> Tagged x = Tagged (f x)
{-# INLINE (<*>) #-}
_ *> n = n
{-# INLINE (*>) #-}
@santiweight, Data.Sequence
has custom implementations for both *>
and <*
that produce asymptotically more compact result structures than the default definitions.
As for Enum
: it's a disaster of a class, and I for one would welcome a complete replacement.
@santiweight Ah, I think I understand. You are saying that the CLC should elaborate some general principles around the design of type classes in base. Then we should systematically move towards fulfilling those principles. This particular proposal would become merely an artefact of that process. I am inclined to agree. Rather than having a one-off discussion about whether removing (/=)
pulls its weight, where analogies to (<=)
are buried, let's hash out the general principles in one go. Sounds appealing to me!
Thanks guys. Again the feedback is very appreciated and I promise very well-received. I'll just reiterate the point that I am pointing out that this "small" proposal should come with fixing all of these issues under a global base
audit instead of the whack-a-mole.
I promise for example, that when Enum
gets its own proposal, you will all (and some other newbie like myself) waste your time explaining the subtleties of each typeclass and the design decisions that are pretty obviously non-trivial. If people want to accept this proposal, I hesitate for the other reasons I described above regarding compatibility and frequency of changes.
But this conversation we just had will be had again if we don't address the larger issue of not having a centralised idea behind what the actual fuck base
is meant to be! And what the hell amount of breakage is permitted and for how much gain! It's not clear to me at all!
@tomjaguarpaw That is exactly my point.
Control.Monad.ST.Lazy
also has a custom *>
for performance reasons, as I recall—lots of things there can't inline because of delicate concurrency issues.
(Off-topic: @treeowl It's absolutely essential for almost every Applicative
to have a specialised *>
implementation, in order to avoid space leaks.)
I've proposed a separate discussion so that we can remove this high-level discussion out of this precise and well-defined change. Sorry for the hijack @nomeata!
The point being, we shouldn't accept that these proposals would each require more files to have CPP
This proposal will not require CPP: the changes this would force libraries to do are fully backwards compatible with previous versiond of base
. And in fact, library authors could (should?) do them already now - just don't instantiate (/=)
(and If you import it explicitly, do it on its own, not as part of Eq((/=))
).
Thanks for asking for the underlying principle! That's a good point. I think it might be “don't have methods with default implementation if there are no good reasons to ever instantiate them explicitly, to reduce unnecessary cognitive load, i.e. don't even bother users of the type class with making that decision”
I've proposed a separate discussion so that we can remove this high-level discussion out of this precise and well-defined change
... and it can be found at https://github.com/haskell/core-libraries-committee/issues/12
As a point of order, I would like to urge participants to discuss the proposal at hand, not other subjects, however appealing they are.
:+1: if you can explain how maintainers will be helped to deal with the breakage. :-1: if it's just considered "their job".
I have run nofib perf tests using current master
and Joachim's
single-method-eq branch
master
is vanilla master
master2
, single-eq
, and single-eq2
are run with following patch
diff --git a/real/fulsom/Interval.hs b/real/fulsom/Interval.hs
index e0fa79c..94b32b2 100644
--- a/real/fulsom/Interval.hs
+++ b/real/fulsom/Interval.hs
@@ -33,7 +33,7 @@ a # b = a :#: b
instance (Ord a, Eq a) => Eq (Interval a) where
a == b = a >= b && a <= b -- Not correct - but it will do.
- a /= b = a > b || a < b
+-- a /= b = a > b || a < b
instance (Ord a) => Ord (Interval a) where
diff --git a/real/hidden/Numbers.hs b/real/hidden/Numbers.hs
index 9059a00..42ffa01 100644
--- a/real/hidden/Numbers.hs
+++ b/real/hidden/Numbers.hs
@@ -6,7 +6,7 @@ data Number = Tolerant Float
-}
instance Eq Number where
Tolerant a == Tolerant b = abs(a-b) < eps
- Tolerant a /= Tolerant b = abs(a-b) > eps
+-- Tolerant a /= Tolerant b = abs(a-b) > eps
instance Ord Number where
Tolerant a <= Tolerant b = a-eps < b
Tolerant a < Tolerant b = a < b-eps
Highlights:
| || | master/ | std. err. | master2/ (rel) | std. err. | single-eq/ (rel) | std. err. | single-eq2/ (rel) | std. err. |
+===============================++==+=============+===========+================+===========+==================+===========+===================+===========+
| imaginary/digits-of-e1 || | 4.898e9 | 0.0% | +0.11% | 0.0% | +12.93% | 0.0% | +12.85% | 0.0% |
| real/eff/VSD || | 6.791e7 | 0.0% | +24.00% | 0.0% | +28.52% | 0.0% | +24.00% | 0.0% |
| smp/callback001 || | 7.371e9 | 0.0% | +171.92% | 0.0% | -77.81% | 0.0% | -78.11% | 0.0% |
| spectral/puzzle || | 6.600e9 | 0.0% | -0.26% | 0.0% | -13.37% | 0.0% | -13.70% | 0.0% |
+===============================++==+=============+===========+================+===========+==================+===========+===================+===========+
| geom mean || | | +0.73% | -0.50% | -0.70% | | | | |
I haven't looked at these programs, whether there's something obvious why there is such increase or decrease in instruction count.
This Norm
al nofib
run also suggests that whether Eq
is a single method class or not is not performance critical decision.
Full report:
Thanks for running that!
Yeah, the joy of benchmarking… The code of smp/callback001
is unchanged between master and
master2`, right? That seems to indicate that something is seriously wrong with the result :-(
@nomeata
The code of smp/callback001 is unchanged between master and master2
Well, the RTS and base
are changed, but the benchmark itself no.
The benchmark begins with:
{-# LANGUAGE ForeignFunctionInterface #-}
-- This benchmark is also ffi014 in the test suite.
-- This program behaves unpredictably with the non-threaded RTS,
-- because depending on when the context switches happen it might end
-- up building a deep stack of callbacks. When this happens, the run
-- queue gets full of threads that have finished but cannot exit
-- because they do not belong to the topmost call to schedule(), and
-- the scheduler repeatedly traverses the run queue full of these
-- zombie threads.
so that is maybe what happened.
So I originally thought the combination of
\x y -> not (x == y)
in the subject of the case-expressionWould result into a performance degradation, because the not
can neither be absorbed into the equality, nor can it be absorbed into the case-expression (by switching the alternatives around). i.e. the case-expression corresponding to not
would remain.
So I created the following to test my hypothesis:
{-# OPTIONS_GHC -O -ddump-simpl -ddump-to-file #-}
import Criterion
import Criterion.Main
import Data.Word
filterX :: (a -> Bool) -> [a] -> [a]
filterX _pred [] = []
filterX pred (x:xs)
| pred x = x : filterX pred xs
| otherwise = filterX pred xs
{-# NOINLINE filterX #-}
eq1 :: [Word16] -> Int
eq1 (x:xs) = length (filterX (/= x) xs)
{-# NOINLINE eq1 #-}
eq2 :: [Word16] -> Int
eq2 (x:xs) = length (filterX (not . (== x)) xs)
{-# NOINLINE eq2 #-}
eq1Bench :: Benchmark
eq1Bench = env setup $ \m ->
bench "(/= x)" (nf eq1 m)
where
setup = return [0 .. 4000]
eq2Bench :: Benchmark
eq2Bench = env setup $ \m ->
bench "(not . (== x))" (nf eq2 m)
where
setup = return [0 .. 4000]
main :: IO ()
main =
defaultMain
[ eq1Bench
, eq2Bench
]
And sure enough if we look at the core we see the case-expression corresponding to
The case expression corresponding to the not
isn't optimized away in the worker for eq2
not
will however be inlined into the equality operator for the lifted Word16 arguments, and we see that the branches of the case expressions in the worker for eq2
are switched around:
Main.$weq1
= \ (w_s4Z9 :: [Word16]) ->
case w_s4Z9 of {
[] -> case lvl1_r51e of wild1_00 { };
: x_a1HD xs_a1HE ->
GHC.List.$wlenAcc
@ Word16
(filterX_r1jq
@ Word16
(\ (ds_d36N :: Word16) -> GHC.Word.neWord16 ds_d36N x_a1HD)
xs_a1HE)
0#
}
Main.$weq2
= \ (w_s4Ze :: [Word16]) ->
case w_s4Ze of {
[] -> case lvl3_r51g of wild1_00 { };
: x_a2CH xs_a2CI ->
GHC.List.$wlenAcc
@ Word16
(filterX_r1jq
@ Word16
(\ (x1_a3er :: Word16) ->
case x1_a3er of { GHC.Word.W16# x2_a3i6 ->
case x_a2CH of { GHC.Word.W16# y_a3i9 ->
case GHC.Prim.eqWord# x2_a3i6 y_a3i9 of {
__DEFAULT -> GHC.Types.True;
1# -> GHC.Types.False
}
}
})
xs_a2CI)
0#
}
However, actually running the benchmark gave the following surprising result:
benchmarking (/= x)
time 29.01 μs (28.95 μs .. 29.07 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 29.10 μs (29.07 μs .. 29.15 μs)
std dev 131.3 ns (96.10 ns .. 191.5 ns)
benchmarking (not . (== x))
time 26.85 μs (26.81 μs .. 26.90 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 26.85 μs (26.83 μs .. 26.88 μs)
std dev 76.05 ns (47.50 ns .. 117.4 ns)
The version with the to eqWord#
is faster than the single-call to the neWord16
. Perhaps reading the CMM can give additional insights as to why the neWord16
version is slower, but I personally can't read CMM well. I'm also surprised neWord16
isn't inlined, even though it's saturated and there being a {-# INLINE [1] neWord16 #-}
pragma.
@Bodigrim suggested I take the proposal that I recently submitted to
libraries@
here, to play guinea pig for this process.What
I suggest to change
to
and turn
(/=)
into a normal function.Why
If we’d define
class Eq
now, we wouldn’t include(/=)
.In general, the motivations for “redundant” methods with default implementation could be
Here, neither of these apply (justification below), and “fixing” this is worth it, for the following reasons:
Teaching: The
Eq
class is often the first, or one of the first, type classes that beginners will face. This means that educators have to explain what default methods are and why they exist, and then have to apologetically say “but this doesn't really apply here, sorry”. (TheOrd
class is much better in that regard, for example(<=)
can be implemented more efficiently thancompare
.)Development cost and effort: Every developer implementing an
Eq
instance will have to make a decision whether to instantiate(/=)
. They have more documentation to read. And precisely because(/=)
doesn’t really make sense here, they might have to think extra long about whether to instantiate(/=)
or not.Removing the method will save some time of future developers.
Similarly. whenever someone reads code and comes across an
Eq
instance with(/=)
defined explicitly they have extra work to do, and think “Why was this defined?”, “Is it lawful?”. Removing the method from the class alltogether will mean library code gets simpler and easier to read.Removing the method will cause libraries to have less code, not more.
Lawfulness.
Eq
saysx /= y = not (x == y)
, but that is only true if everybody plays by the rules. By having(/=)
it is possible for instances to be unlawful (intentionally or accidentally), for little gain.Removing the method will guarantee that equation.
Code changes are backward compatible. This change will require some library maintainers to change their code. But the code change is simple (remove the method, possibly adjust the
import
statement), and is compatible with old versions ofbase
. No complex migration strategy, no CPP needed.Performance gains (very minor). Single method classes are currently represented more efficiently in GHC. In some higher-order cases removing the method might speed up programs. Also, compiling
base
and libraries withEq
instances and creating haddocks gets (very slightly) faster, because there is less to do.Implementation in
base
is simple: See https://gitlab.haskell.org/ghc/ghc/-/merge_requests/6793.Why not (and rebuttals)
Most of these are taken from the
libraries
thread; rebuttals are mine.(/=)
can sometimes be implemented more efficiently.Rebuttal: Beyond the cost of a single call to
not
, which GHC often optimizes away anyways, this is not possible (for law-abiding instances.)(/=)
can sometimes be more natural to implement.Rebuttal: Relatively unlikely, and possibly more error-prone and harder to read, due to the many negations.
There is a one-time cost for affected library authors, and it’s not worth it.
A mildly naive hackage search shows 131 affected packages.
Rebuttal: The change is easy, mechanical and backward compatible. But more important: Because this change simplifies both
base
and the code of affected libraries, it will pay off eventually, because of all the work we (as the community) save over time by no longer having to think about(/=)
. In fact, the evidence that 131 packages’s authors manually (and, in most cases, pointlessy) wrote an(/=)
means this proposal will save the next 131 packages’s authors, and beyond, unnecessary work.Some libraries use this to write unlawful instances, e.g. https://github.com/haskellari/postgresql-simple/issues/78.
Rebuttal: Don’t write unlawful instances. If you need
(/=)
to behave differently than the negation of(==)
,Eq
is the wrong type class. I understand that this is a bit harsh, as the libraries authors might have reasons to break the lawfulness.RULE matching on single method class methods doesn't work reliably, but does work for other classes.
Rebuttal: I consider that a bug in GHC and would fix one way or another before enacting this proposal.
We should clean up the type class hierarchy in a bigger way.
Rebuttal: Maybe, but that should not doing this small cleanup proposed here.