ku-fpg / hermit-shell

HERMIT with GHCi shell
BSD 3-Clause "New" or "Revised" License
2 stars 0 forks source link

Compilation Regression with GHC 7.10.3 #39

Closed ecaustin closed 8 years ago

ecaustin commented 8 years ago

There is a massive increase in the amount of memory required to compile hermit-shell with GHC 7.10.3, to the point that it is crashing the VM I'm using.

The memory usage skyrockets when compiling the HERMIT.RemoteShell.Orphanage module which leads me to believe that the problem may be related to this GHC/binary bug: https://ghc.haskell.org/trac/ghc/ticket/9630

Does anyone have time to confirm this and possibly replace some of the generic instances with manually written ones as a short term fix?

I'd volunteer to do it myself, but my hands are tied for the time being.

ecaustin commented 8 years ago

I should note that this problem exists for all versions of GHC 7.10.x based on my limited testing, but is particularly bad for GHC 7.10.3 for whatever reason.

I can at least get compilation to finish with GHC 7.10.2, so I've rolled back to that version for now.

andygill commented 8 years ago

@roboguy13 David - is this something you could look at?

RyanGlScott commented 8 years ago

Urgh - I'm 99% sure this is an aeson issue. For whatever reason, default implementations GHC generics-related functions don't inline very well, and if you force them to be inlined via INLINE pragmas, they can result in insane memory usage. pandoc-types has also experienced a similar issue, and their workaround was to roll back from aeson-0.10 to aeson-0.9.

I don't know the root cause of the issue, but I speculate that if they removed all of the INLINE pragmas from Data.Aeson.Types.Generics' generic class methods, they might not have such a bad time. Unfortunately, @bos is pretty hard to get ahold of, so I don't know how soon this issue will be addressed...

xich commented 8 years ago

The aeson dep in this cabal file already limits to <0.10. Maybe try forcing back to 0.8? Could also just try splitting up that module into one-per-datatype and see if that helps.

Workarounds, of course, but might be easier than writing instances by hand.

ecaustin commented 8 years ago

I can confirm that I am using aeson-0.10 at this point. More specifically, it's the master branch from github because I was tired of running into this warning: https://github.com/bos/aeson/issues/290

Shame on me for changing the cabal file, but it seemed like the easier solution than manually inlining the default implementation everywhere.

That being said, I'm pretty sure I was having the same problems with aeson-0.9.0.1, but I'd be happy to retest using that and earlier versions.

ecaustin commented 8 years ago

Maximum memory residency during compilation with GHC-7.10.2 and aeson-0.8.1.1was ~1.4GB on my machine which which is only marginally less than with aeson-0.10.

Once Drew wraps up porting HERMIT to GHC 8, I'll repeat the tests with it.

RyanGlScott commented 8 years ago

My hope is that memory usage involving GHC generics decreases with GHC 8.0, since it was redesigned to encode the generic representation with -XDataKinds rather than generating a jillion empty datatypes an proxy instances.

roboguy13 commented 8 years ago

@andygill I should be able to look into it this weekend. I don't know too much about the details of this aspect of GHC compilation, but I'll see what I can do.

andygill commented 8 years ago

We just need a workaround.

ecaustin commented 8 years ago

I've been more rigorously testing this all day and can confirm it's an issue that is amplified by aeson-0.10.

The maximum memory residency for any combination of GHC-7.10.x and aeson-0.8 or 0.9 is under 2GB which is obviously not ideal, but it's good enough for me to continue with my work.

I'll apologize for screaming fire here since, as @xich pointed out, the hermit-shell.cabal file already disallows aeson-0.10.

RyanGlScott commented 8 years ago

The maximum memory residency for any combination of GHC-7.10.x and aeson-0.8 or 0.9 is under 2GB which is obviously not ideal, but it's good enough for me to continue with my work.

That's somewhat encouraging to hear, since aeson-0.8 and -0.9 both have substantially simpler Data.Aeson.Types.Generic modules than -0.10. Perhaps that's worth investigating some more.

@roboguy13, can you try this:

  1. Get aeson-0.10 as a control, but also prepare a version that removes all INLINE pragmas from Data.Aeson.Types.Generic
  2. Install each and compare the memory usage when compiling pandoc-types.

Someone reported using 7GB of memory when compiling pandoc-types with aeson-0.10, so if removing the INLINE pragmas makes it better, it's probably worth opening a pull request for.

roboguy13 commented 8 years ago

@RyanGlScott Do you happen to know if the GHC developers are aware of these things yet? This might make a good test case for them (although it might be an unavoidable issue where the solution is just fewer INLINEs).

ecaustin commented 8 years ago

I believe what we're running into is an instance of this bug: https://ghc.haskell.org/trac/ghc/ticket/9630

RyanGlScott commented 8 years ago

@roboguy13, I believe that's precisely the issue. Aside from the aforementioned inlining issues, a comment on Trac #9630 has another nifty hack that might help speed things up: make each generic class have exactly one method. Apparently, if you have multiple class methods like this (taken from Data.Aeson.Types.Class):

class GToJSON f where
    gToJSON :: Options -> f a -> Value
    gToEncoding :: Options -> f a -> Encoding

it hurts the optimizer a lot. Try taking all of the generic classes in Data.Aeson.Types.Class and Data.Aeson.Types.Generic and splitting them up, e.g.,

class GToJSON f where
    gToJSON :: Options -> f a -> Value

class GToEncoding g where
    gToEncoding :: Options -> f a -> Encoding

To see if that improves things.

andygill commented 8 years ago

I see that stackage is returning to aeson-0.9. Is this for the same reason as the HERMIT issue (slow build) or because of the Null problem, or both?

RyanGlScott commented 8 years ago

Huh, that's a surprisingly difficult question to answer. Neither this nor this elucidate much on what reasons in particular they chose to hold it back (other than mutterings of "issues" and "regressions"). It's probably safe to assume the Null issue and compilation performance are paramount among those regressions, though, so I still think it's worth pursuing this.

ecaustin commented 8 years ago

I'm assuming both, in addition to some other bugs and regressions that aeson-0.10 introduced. There were also a large number of packages that were bumped from LTS because they either could not, or refused, to use aeson-0.10.

ecaustin commented 8 years ago

On a related note, have we tested the Template Haskell derivation mechanism for aeson to see if that would be an acceptable short-term work-around?: https://hackage.haskell.org/package/aeson-0.10.0.0/docs/Data-Aeson-TH.html

A quick grep shows that we're Template Haskell free as of right now, so I wasn't sure if introducing it to HERMIT would be an acceptable cost for potentially faster compilation, even with aeson-0.9.

RyanGlScott commented 8 years ago

I was looking at another package (pandoc-types) that had compilation slowdowns, so I forked it and replaced all generically derived FromJSON/ToJSON instances with Template Haskell–derived instances. The results are encouraging. On my 64-bit Linux laptop with 4 GB of RAM, I compiled both versions of pandoc-types with aeson-0.10:

$ /usr/bin/time -v cabal install pandoc-types
Resolving dependencies...
Downloading pandoc-types-1.16.0.1...
Configuring pandoc-types-1.16.0.1...
Building pandoc-types-1.16.0.1...
Installed pandoc-types-1.16.0.1
        Command being timed: "cabal install pandoc-types"
        User time (seconds): 251.96
        System time (seconds): 5.90
        Percent of CPU this job got: 41%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 10:20.85
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 3048536
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 54055
        Minor (reclaiming a frame) page faults: 1332223
        Voluntary context switches: 61491
        Involuntary context switches: 15193
        Swaps: 0
        File system inputs: 3456696
        File system outputs: 123552
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
$ /usr/bin/time -v cabal install .
Resolving dependencies...
Configuring pandoc-types-1.16.1...
Building pandoc-types-1.16.1...
Installed pandoc-types-1.16.1
        Command being timed: "cabal install ."
        User time (seconds): 59.07
        System time (seconds): 1.17
        Percent of CPU this job got: 93%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:04.23
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 756052
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 20
        Minor (reclaiming a frame) page faults: 410406
        Voluntary context switches: 2605
        Involuntary context switches: 3242
        Swaps: 0
        File system inputs: 34984
        File system outputs: 118000
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

tl;dr Compilation time went from over 10 minutes to 1 minute, and went from using 3 GB of RAM (and thrashing my laptop mercilessly) to less than 1 GB after switching to TH.

RyanGlScott commented 8 years ago

I also got encouraging results after applying the changes here and here to aeson-0.10. I've opened a pull request here.

ecaustin commented 8 years ago

Did you measure just compilation performance, or run-time performance too? Does reducing inlining affect encoding/decoding speed at all?

RyanGlScott commented 8 years ago

I just ran a benchmark from the aeson repo that tests the performance of the generic deriving mechanism and posted the results here. From what I can tell, my changes don't appreciably affect the runtime performance.

RyanGlScott commented 8 years ago

Good news: aeson-0.11 was just released, which fixes this issue! I just compiled hermit-shell and HERMIT.RemoteShell.Orphange compiled in a manner of seconds without eating up too much memory. I've adjusted hermit-shell.cabal to allow aeson-0.11 (but to disallow aeson-0.10, so that Travis won't try to pick it while we wait for hermit-shell's dependencies to upgrade to aeson-0.11 as well).

andygill commented 8 years ago

Great! I was just about to push remote-json to hackage. I'll make sure it all works with aeson-0.11, and push it. Should make building hermit a bit easier. Do you know the status of KURE?

RyanGlScott commented 8 years ago

HERMIT still needs to be overhauled to use kure-3 (and I say overhauled because it will involve a significant amount of refactoring). Sadly, I don't think I'll have much time in the forseeable future to work on it.

ecaustin commented 8 years ago

Wow, it sounds like aeson-0.11 is a major improvement over even aeson-0.9! Awesome.