parsonsmatt / parsonsmatt.github.io

My Github pages website
Other
77 stars 25 forks source link

How to build modules in parallel? #60

Closed SmartHypercube closed 2 years ago

SmartHypercube commented 2 years ago

When I stack build a package, the modules in this package are always built one by one. Since you said in Keeping Compilation Fast that "GHC can compile modules in parallel", I would like to know how did you achieve this?

I have tried building the latest commit of persistent with these commands. They all cost the same time (and build modules one by one):

stack build persistent
stack build -j4 persistent
stack build --ghc-options -j4 persistent

At the end of the blog post you gave a command stack build --fast --file-watch --ghc-options "-j4 +RTS -A128m -n2m -RTS", I have verified that the -j4 part does nothing. --fast and +RTS -A128m -n2m -RTS helped, though.

parsonsmatt commented 2 years ago

How are you checking that modules are not built in parallel? Are you sure you have a module set that can be built in parallel?

I'd expect stack build -j4 persistent to build packages in parallel, but not modules. stack build --ghc-options "-j4" I would expect to build modules in parallel. Possibly it needs the +RTS -N4 -RTS invocation too.

SmartHypercube commented 2 years ago

The steps I took:

  1. Clone https://github.com/yesodweb/persistent
  2. (Current commit is https://github.com/yesodweb/persistent/commit/b8761301ac9aef36d128346fb2bd66f190baec03 )
  3. stack build persistent
  4. stack clean
  5. (experiment A) stack build persistent
  6. stack clean
  7. (experiment B) stack build --ghc-options "-j4 +RTS -N4 -RTS" persistent

Experiment A and B took the same time (40s - 44s) on my machine (with 4 cores and 8 threads). I repeated each one multiple times.

How are you checking that modules are not built in parallel?

It didn't compile any faster.

Are you sure you have a module set that can be built in parallel?

It's the persistent library. I saw you used it in some posts as the example. I also tried multiple packages of mine. None of them took less time to compile in this way.

parsonsmatt commented 2 years ago

Hm. I suspect persistent may not be a great test case here. I'm seeing some differences in system/user use, but it looks like we're losing a lot of time to overhead:

stack build --fast  
    91.89s user     4.56s system    172% cpu    56.046 total

stack build --fast --ghc-options "-j4"  
    136.03s user    22.93s system   266% cpu    59.614 total

stack build --fast --ghc-options "-j4 +RTS -N4 -RTS"  
    138.91s user    24.02s system   276% cpu    59.005 total

stack build --fast --ghc-options "-j1 +RTS -N1 -RTS"  
    91.97s user     4.80s system    174% cpu    55.355 total

stack build --fast --ghc-options "-j8 +RTS -N8 -RTS"  
    321.60s user    61.05s system   455% cpu    1:24.05 total

These CLI flags do result in increased percentage of CPU use, but we're also seeing a considerable increase in user and system time. So we do appear to be building things in parallel, but this is actually slowing things down. So whatever overhead is incurred by parallel builds, on persistent at least, dominate any potential gains from parallelism.

SmartHypercube commented 2 years ago

You are correct. I just wrote a test package which contains 4 big modules not depending on each other and 1 small module which imports them. The results are:

stack build
    real    0m12.584s
    user    0m12.264s
    sys 0m0.314s

stack build --ghc-options "-j2"
    real    0m7.967s
    user    0m13.639s
    sys 0m1.113s

stack build --ghc-options "-j4"
    real    0m5.227s
    user    0m15.058s
    sys 0m2.260s

stack build --ghc-options "-j8"
    real    0m5.844s
    user    0m19.997s
    sys 0m9.322s

stack build --ghc-options "-j2 +RTS -N2 -RTS"
    real    0m8.014s
    user    0m13.642s
    sys 0m1.288s

stack build --ghc-options "-j4 +RTS -N4 -RTS"
    real    0m4.874s
    user    0m14.510s
    sys 0m1.792s

stack build --ghc-options "-j8 +RTS -N8 -RTS"
    real    0m5.427s
    user    0m19.156s
    sys 0m7.794s

It seems the -N part is not needed. Also, there is indeed some overhead with large N values.

Anyway, thanks for replying my comment!