Reducing size of generated code (compile time reduction)

MaxGabriel commented 4 years ago

After the dramatic reduction in compile times introduced by #1003, I'm thinking there may be more gains to be had.

I uploaded the template haskell output of persistent template's test file main.hs, as well as a separate file doing just a mkMigrate, here: https://gist.github.com/MaxGabriel/65027cfcea90b68b50e6ba423c619ded

share [mkPersist sqlSettings { mpsGeneric = False }, mkDeleteCascade sqlSettings { mpsGeneric = False }] [persistUpperCase|
Person json
    name Text
    age Int Maybe
    foo Foo
    address Address
    deriving Show Eq
Address json
    street Text
    city Text
    zip Int Maybe
    deriving Show Eq
NoJson
    foo Text
    deriving Show Eq
|]

share [mkPersist sqlSettings { mpsGeneric = False, mpsGenerateLenses = True }] [persistLowerCase|
Lperson json
    name Text
    age Int Maybe
    address Laddress
    deriving Show Eq
Laddress json
    street Text
    city Text
    zip Int Maybe
    deriving Show Eq
|]

migrate:

share [mkMigrate "getRequiredMigrations"] [persistLowerCase|
Mperson json
    name Text
    age Int Maybe
    address Maddress
    deriving Show Eq
Maddress json
    street Text
    city Text
    zip Int Maybe
    deriving Show Eq
|]

I'm not seeing anything jumping out as obvious as #1003 at this point. mkMigrate basically duplicates all the EntityDefs for the liftAndFixKeys trick it does (listing all entitydefs once, then listing each entity def again for each migrate function, so that could be something. (Edit: maybe a Map could be stored, and each call to migrate just references a key in the map?)

If not fundamental rethinks, there may be wins to be had by shaving off code bit by bit. For example, the new definition of fromPersistValues is much smaller than before 2.8.0. But there's still a lot of duplication with mapLeft, fieldError, the table name, and fromPersistValue being repeated each time.

fromPersistValues [x1_ausl, x2_ausm, x3_ausn, x4_auso]
        = Person
            <$>
              (Database.Persist.TH.mapLeft
                 ((Database.Persist.TH.fieldError (pack "Person")) (pack "name"))
                 . fromPersistValue)
                x1_ausl
            <*>
              (Database.Persist.TH.mapLeft
                 ((Database.Persist.TH.fieldError (pack "Person")) (pack "age"))
                 . fromPersistValue)
                x2_ausm
            <*>
              (Database.Persist.TH.mapLeft
                 ((Database.Persist.TH.fieldError (pack "Person")) (pack "foo"))
                 . fromPersistValue)
                x3_ausn
            <*>
              (Database.Persist.TH.mapLeft
                 ((Database.Persist.TH.fieldError (pack "Person")) (pack "address"))
                 . fromPersistValue)
                x4_auso

Edit: I tried removing the duplication of mapLeft fieldError and the table name, but it didn't appear to improve performance. The changes I made are here: https://github.com/yesodweb/persistent/compare/fieldErrorOptimization?expand=1 and benchmarks here: https://gist.github.com/MaxGabriel/5badd57a3ded7bac2ba43cc3c2a2a3b3

I don't have a great idea of what reduces compile times besides reducing raw LOC/expressions to parse, which isn't ideal. I'm not sure if the benchmarks will reliably show micro-improvements, but we'll see. I will probably work on this over time, but I think opening the issue is worthwhile so people can have an awareness of just how much code is generated, what it looks like, and realize there is potential for improving it.

Your OS name and version - MacOS Mojave
The versions of tools you're using (e.g. stack, ghc). - stack 1.9.3, lts-14.1
The versions of dependencies you're using - Persistent 2.8.0

MaxGabriel commented 4 years ago

Looking at the definition of to/fromPersistValue, these could probably be reduced to a single function call that only takes the column names.

    instance PersistField RecipientPaymentData where
      toPersistValue
        = \ ent_a5x7G
            -> (PersistMap
                  $ (GHC.List.zip ((GHC.Base.map Data.Text.pack) ["name", "emails"]))
                      (GHC.Base.map toPersistValue $ toPersistFields ent_a5x7G))
      fromPersistValue
        = ((\ x_a5x7H
              -> let
                   columns_a5x7I
                     = unordered-containers-0.2.10.0:Data.HashMap.Strict.Base.fromList
                         x_a5x7H
                 in
                   (fromPersistValues
                      $ (GHC.Base.map
                           (\ name_a5x7J
                              -> case
                                     (unordered-containers-0.2.10.0:Data.HashMap.Base.lookup
                                        (Data.Text.pack name_a5x7J))
                                       columns_a5x7I
                                 of
                                   Just v_a5x7K -> v_a5x7K
                                   Nothing -> PersistNull)
                           $ ["name", "emails"])))
             <=< getPersistMap)

Edit: This one I have a PR to improve with #1014

MaxGabriel commented 4 years ago

I'm thinking the next step might be to compile with -ddump-timings and see what that suggests as to what uses the most time. If the simplifier, -ddump-simpl may help with that.

A side thought is the benchmarks should potentially be tried with -O0, given that reducing compilation time is most valuable for development purposes

edits with further ideas:

Reduce derives for keys. Potentially make this the list of derives for keys customizable on a per-file or per-entity basis? Notably, projects may never use the Servant or Generic instances for keys
Option to remove PersistField instances for entities? Seems like more of a niche feature
Option to remove fieldLens?
Make persistFieldDef just call into entityDef? Looks like it potentially duplicates code?

dump-timings output with -O0 flag:

(note: the default is -O1, which spends much more time in the simplifier)

Parser [Main]: alloc=3989312 time=1.241
Simplify [expr]: alloc=335992 time=0.613
CorePrep [expr]: alloc=8184 time=0.029
ByteCodeGen [Ghci1]: alloc=32160 time=0.061
Simplify [expr]: alloc=28860640 time=13.433
CorePrep [expr]: alloc=1016512 time=0.725
ByteCodeGen [Ghci1]: alloc=9252432 time=86.203
Simplify [expr]: alloc=304176 time=0.331
CorePrep [expr]: alloc=8184 time=0.023
ByteCodeGen [Ghci1]: alloc=32160 time=0.063
Simplify [expr]: alloc=21300944 time=5.775
CorePrep [expr]: alloc=744528 time=0.357
ByteCodeGen [Ghci1]: alloc=6796824 time=2.484
Renamer/typechecker [Main]: alloc=439201128 time=488.059
Desugar [Main]: alloc=13749672 time=23.781
Simplifier [Main]: alloc=113449920 time=97.598
CoreTidy [Main]: alloc=328647456 time=165.923
CorePrep [Main]: alloc=5248 time=0.015
CodeGen [Main]: alloc=618487888 time=486.540
CorePrep [Main]: alloc=5248 time=0.034
CodeGen [Main]: alloc=650611904 time=408.064

How are the benchmarks in persistent-template so fast? Real world usage on our codebase seems much slower. Maybe stuff like referencing between models or something?

MaxGabriel commented 4 years ago

One discovery from today is that the benchmarks don't give the whole story as far as compile time goes. For example, I copied one of our models files from work in. On the benchmark, it takes something like 19ms, whereas in reality it takes something like 9 seconds (based on summing the outputs of -ddump-timings).

So, I think some testing needs to be done in a more realistic environment. Figuring out how to do that.

MaxGabriel commented 4 years ago

Also got a more real-world sample of a file and used -ddump-timings on that. The most time is spent in CodeGen, then Renamer/typechecker, then CoreTidy.

I'm not familiar enough with GHC to know if there are any guessable implications to spending time there. Potential next steps would be to ask people more familiar with the compiler, or maybe try a profiled version of GHC?

Parser [Mercury]: alloc=1620800 time=0.638
Simplify [expr]: alloc=219296 time=0.421
CorePrep [expr]: alloc=8184 time=0.025
ByteCodeGen [Ghci1]: alloc=32160 time=3.414
Simplify [expr]: alloc=816820344 time=211.062
CorePrep [expr]: alloc=18159280 time=6.274
ByteCodeGen [Ghci1]: alloc=173511768 time=183.381
Renamer/typechecker [Mercury]: alloc=3417988152 time=2505.517
Desugar [Mercury]: alloc=117509960 time=144.366
Simplifier [Mercury]: alloc=913265184 time=789.241
CoreTidy [Mercury]: alloc=2715017448 time=1849.522
CorePrep [Mercury]: alloc=5248 time=0.018
CodeGen [Mercury]: alloc=4724617648 time=2872.675
CorePrep [Mercury]: alloc=5248 time=0.029
CodeGen [Mercury]: alloc=4995066176 time=3268.829

parsonsmatt commented 4 years ago

Yeah the benchmarks are only useful to determine how long it takes to parse and generate the Haskell code, not how long it takes to compile it. It's possible that the values aren't forced enough and that laziness is tricky, but I really don't know with Template Haskell :\

MaxGabriel commented 4 years ago

Ok, I made a small test project in persistent to test changes to persistent-template. Unfortunately I accidentally rm -rfed it, but I should be able to recreate it without too much difficulty.

We can test changes in compile time using bench, which calls criterion under the hood. A downside is that this tests the whole compilation, not just the one file we care about.

bench --before="stack clean persistent-performance-test" "stack build persistent-performance-test --ghc-options='-O0 -ddump-timings -ddump-to-file'" --after="cp persistent-performance-test/.stack-work/dist/x86_64-osx/Cabal-2.4.0.1/build/TestPerformance.dump-timings results-no-servant-no-entitypersistfield/`uuidgen`.dump-timings"

To look at a single file, we can have the benchmark copy the timing results for the single file we care about to a folder, then analyze those results. This works as an example:

Dir.chdir('results')
filenames = Dir.glob('*.dump-timings')

totals = []

filenames.each do |name|
  text = File.read(name)
  lines = text.split("\n")
  total = 0
  lines.each do |line|
    start, time = line.split("time=")
    total += time.to_f
  end

  totals << total
end

mean = totals.inject(0.0) { |sum, el| sum + el } / totals.size
puts "Mean is #{mean}"

Ideally the analysis of the .dump-timing stuff would be as good as criterion (e.g. use an ordinary least squared regression, not just a basic mean).

Based on some simple testing with this, there are minor wins to be had in making certain features optional, for example removing HTTP API Data derivations (used for Servant), or removing PersistField instances for entities (used to embed entities in other entities as a column).

Still digging to see if there's any no-compromise wins. I think with this setup it should be easy to test some other improvements I have ideas about.

parsonsmatt commented 4 years ago

Yeah in a breaking release of persistent I'd like to drop most of the default derived classes per entity. @jessekempf's work towards allowing you to specify a "derive a class for every entity in the definitions" is a huge step towards that.

I'd like to work towards making it easier to define the PersistEntity class by hand, too, so you don't need the QQ at all. I think the QQ is the Right Answer almost all of the time, but it shouldn't be the Only Answer.

yesodweb / persistent

Reducing size of generated code (compile time reduction) #1005