haskell / aeson

A fast Haskell JSON library
Other
1.25k stars 321 forks source link

`Generically`-like instances are much slower to compile since 2.0.0.0 #1053

Open amesgen opened 1 year ago

amesgen commented 1 year ago

Consider deriving e.g. FromJSON via Generically or CustomJSON from deriving-aeson (see https://github.com/fumieval/deriving-aeson/issues/16 for the upstream issue). This is currently very slow for large-ish types, e.g.

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE DerivingVia #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE UndecidableInstances #-}

import Data.Aeson
import GHC.Generics (Generic (..))

-- 30 empty constructors
data X = X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 | X11 | X12 | X13 | X14 | X15 | X16 | X17 | X18 | X19 | X20 | X21 | X22 | X23 | X24 | X25 | X26 | X27 | X28 | X29 | X30
  deriving stock (Generic)
  deriving (FromJSON) via Genericallyish X

newtype Genericallyish a = Genericallyish a

instance (Generic a, GFromJSON Zero (Rep a)) => FromJSON (Genericallyish a) where
  parseJSON value = Genericallyish <$> genericParseJSON defaultOptions value

takes ~22s to compile on my machine using 8.10 and 9.2, and still 10s on 9.6.

But bisection reveals that before #846, this snippet actually compiled very fast, in only 2s on my machine with GHC 8.10! The main reason seems to be that much less code is generated, which is the exact opposite of what #846 was about in the first place. Maybe there is a minimal subset that can be reverted? :thinking:

phadej commented 1 year ago

It doesn't seem to be a problem with #846.

On my machine, with GHC-9.6.2

data X = X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 | X11 | X12 | X13 | X14 | X15 | X16 | X17 | X18 | X19 | X20 | X21 | X22 | X23 | X24 | X25 | X26 | X27 | X28 | X29 | X30
  deriving stock (Generic)
  deriving (FromJSON) via Generically X

takes 15s to compile, but

instance FromJSON X where
     parseJSON = genericParseJSON defaultOptions

just 2.5

Why DerivingVia Generically is slow, I have no idea, but I doubt it's because of INLINEs (missing elsewhere or added to Generic implementation of FromJSON).

amesgen commented 1 year ago

It doesn't seem to be a problem with #846.

What leads you to that conclusion? As said above, I found this via bisecting, i.e. we have

 $ git log -2 --oneline b5f12d28bcf6533a0e63872a219f780aff6a9451
b5f12d2 Remove -O2 and most of INLINE pragmas
b9c635e Merge pull request #844 from phadej/more-bench-suite

and the snippet I posted above compiles quickly on b9c635e (pre-#846), but slowly on b5f12d2 (the commit from #846). This seems to be a very strong indicator to me that it is the reason for the slowdown, or am I missing something here?


Indeed, it is interesting that this does not seem to be a problem with the Generic machinery per se as "manual" calls to genericParseJSON are fine :+1:

phadej commented 1 year ago

Let's be clear:

Reverting #846 is out of question. INLINE everywhere makes everything compile slowly (incl. manually written instances) and causes code bloat.

The #846 however added INLINE pragmas to Generic implementation functions. Maybe that was a wrong choice (at least it was bad to do in the same commit). But usually it's a right thing to do because we want Generic stuff to optimize away, so inlining aggressively is good default.

That said, aeson Generic deriving is (feature)bloated, and probably optimizer has tough time reducing it. That's why https://github.com/haskell/aeson/issues/933 is on issue tracker. In particular, in this case having a specific newtype for deriving for enumerations would be better for generated code and GHC optimizing (and predictability of what will be generated - adding | X31 Int wouldn't silently change everything).

E.g. in your example aesons Generic bits don't optimize away. (parseJSONList is also making things complicated, I'd be very happy to remove that, making String serialise as array of characters - but that will make a lot of people angry - so we have to pay that price too).

And even than, Generic compiling will always be questionable performance (either at runtime or compile time or both). It's just hard to ensure. TH doesn't have these problems.

So TL;DR, aeson is full of compromises, and #846 is not going to be reverted.

amesgen commented 1 year ago

Sure, wasn't suggesting fully reverting #846; but there still might be a minimal subset that can be reverted that fixes this issue, but does not cause compile-time increases in other areas. I will play around with it a bit, but one potential outcome indeed is "Generic-related stuff is too tricky performance-wise, this issue won't be fixed, at least until sth like #933".

epoberezkin commented 1 year ago

Came here with a very slow/high memory compilation of the type with ~24 constructors...

Moving to TH is not trivial for us too, as we are using a forked version because of that: https://github.com/haskell/aeson/pull/926

Questions:

phadej commented 1 year ago

are you ok with this option in principle

Not really. I don't like the swiss-knife approach current Generic framework uses, and adding more options makes its maintenance even worse.

933 will solve that, but I don't know when I'll have time and motivation to do that. #933 will probably solve this issue too, because simple Generic deriving is... simpler i.e. quicker to compile.

epoberezkin commented 1 year ago

I don't like the swiss-knife approach current Generic framework uses, and adding more options makes its maintenance even worse.

I am usually also against adding options, but the current default simply makes no sense - you get objects for all other constructors and an empty array for an empty one, and where it is needed it simply doesn't work...

The only reason I suggested it as an option was to avoid breaking changes - but would you maybe prefer making it a breaking change? I can't see the use case where the current default is useful, tbh.

phadej commented 1 year ago

but would you maybe prefer making it a breaking change?

No. Breaking change which doesn't cause compiler to scream is a worst possible for the users. The majority of people will be surprised.

It's also not a choice.

data Foo = Foo Int Bar is encoded as an array, so it's natural to encode data Bar = Bar as an empty array.

And from Generics Rep it's impossible to know whether the nullary data type was thought as a product, or a "record":

Prelude GHC.Generics> data Foo = Foo deriving Generic
Prelude GHC.Generics> data Bar = Bar {} deriving Generic
Prelude GHC.Generics> :kind! Rep Foo
Rep Foo :: * -> *
= M1
    D
    (MetaData "Foo" "Ghci2" "interactive" False)
    (M1 C (MetaCons "Foo" PrefixI False) U1)
Prelude GHC.Generics> :kind! Rep Bar
Rep Bar :: * -> *
= M1
    D
    (MetaData "Bar" "Ghci3" "interactive" False)
    (M1 C (MetaCons "Bar" PrefixI False) U1)

Their Rep is the same, no mention of fields (because there aren't any).

The current default makes sense.

epoberezkin commented 1 year ago

data Foo = Foo Int Bar is encoded as an array, so it's natural to encode data Bar = Bar as an empty array.

Right, but not when you have several record constructors, which is more common in real world, with one empty one among them, and all record constructors are encoded as objects, and the empty one as an empty array... What's worse, it's not something that's easy to fix on the consumer side, as it's not a library.

Hence the need to be able to configure it, and it wasn't even difficult.

epoberezkin commented 12 months ago

Now we have a sum type that takes ~8min and 12gb RAM to compile FromJSON in GHC 9.6.2... Sad times.

Any idea if/when it can be resolved? Could we help by sponsoring the solution a bit?

Also, the same question re #926 - any chance we can sponsor it? As the current SingleFieldObject isn't compatible with the main environment that uses it...

phadej commented 12 months ago

Could we help by sponsoring the solution a bit?

You can contact my employer for options. Though I will hesitate to promise anything concrete. At best, you can have your own deriving which is tailored for your needs, i.e. is fast and does exactly what you need and not more. That's a direction I'd like to move aeson anyway (#933) as current configurable kitchen-sink deriving is not going to do what you (and everyone else) want and be fast at compile and generated code.

epoberezkin commented 12 months ago

Thank you!

You can contact my employer for options.

And here I was hoping it's somebody's individual project :)

In any case, moving just couple of types to TH improved compilation time and memory a lot - any known issues with it?

Also, any chance we could support nullaryToObject (#933)? The current default is not usable with Swift, unfortunately.

Also, any pointers where it would fit in TH - a quick glance through the code didn't suggest an obvious way to add it, so just added {_nulary :: Maybe Int} to the empty constructors for now...