Enable parallel builds - Githubissues

Ecco commented 5 years ago

Builds can take a while. CPUs are getting more and more cores. Let's use them.

Steps to reproduce

nanoc compile
wait
wait
wait some more

Expected behavior

Just like make -j N, it would be great if nanoc could build in parallel and use many cores.

Actual behavior

Nanoc processes items sequentially.

denisdefreyne commented 5 years ago

Hi Ecco,

Parallel compilation is on my wish list too, but it’s not trivial because of two reasons:

The standard Ruby implementation (usually referred to as CRuby, KRI, or MRI) has a global interpreter/VM lock (abbreviated GIL or GVL, respectively), which makes the Ruby interpreter effectively use a single CPU. It does bring benefits for IO-heavy operations (since the Ruby process can do CPU work while waiting for an IO operation to complete) but the research I’ve done on Nanoc web sites is that they are all CPU-heavy, and thus wouldn’t benefit much from parallelization.
JRuby is an alternate Ruby implementation that does not have a GIL/GVL, but it also incurs a startup time, which can be significant in relatively short-running processes such as Nanoc.

Therefore, while it’s certainly possible to parallelize Nanoc, there would be no significant measurable benefit.

One thing that you can do to speed up the compilation of your site is to figure out where the slowness is coming from. Run nanoc compile with the -VV option (which is the same as --verbose --verbose). This will print detailed results of where Nanoc spends its time. While it’s not 100% accurate (because measuring time is hard), it will show you what is taking up the most time.

In particular, the list of filters is interesting to look at; it might be worth using different filters, or optimising slow ones. For example, in one site that I worked on, I swapped out a filter that uses pygmentize in favour of one that uses pygments.rb, yielding a speedup of more than 10x.

I’ve also attempted to run Nanoc within Docker for Mac for a while, but the slow filesystem lead to a 10x-20x slowdown for Nanoc. (This might not be relevant for you, but I found it worth mentioning.)

I’ve also had some success with Bootsnap, which can speed up nanoc invocations quite a bit.

I’ve spent a lot of effort in optimising Nanoc, and you can see that (I hope) in the fact that repeated builds are significantly faster than clean builds. In some sites that I work on, I can have an editor and a live-reloading browser window open next to each other, and have my changes show up with a sub-second latency. That’s the experience that I aim for for even large Nanoc sites, although that’s not always possible.

At this point, however, I believe I might be hitting the limits of what is achievable with Ruby. I’ve experimented with partially reimplementing Nanoc’s core in pre-compiled languages, and that’s certainly a direction that I want to explore further, as I believe there’s a lot of untapped potential.

What are your thoughts?

Ecco commented 5 years ago

Hi @ddfreyne ! First of all, thank you very much for such a nice and comprehensive answer!

Our use-case might be a bit special: 99% of the time is spent generating PDFs with PDFKit using a custom Nanoc::Filter. Generating a single item takes over 10 seconds, while other items (html, css) are usually far below 0.1 sec.

Those items are all independent one from another, and I'm pretty sure it could be possible to run the filter in parallel. Anyway, I'm going to look at how I could make the filter faster and will keep you posted 😄

denisdefreyne commented 5 years ago

@Ecco That’s a good point! PDF generation is notoriously slow, and this case might indeed be something that can be parallellized. I’ll need to think a bit more about how to make this work, though.

denisdefreyne commented 5 years ago

I’ve started an experiment to make Nanoc compile items in parallel. You can find it at nanoc/nanoc/pull/1385 — but be warned that this is highly experimental and very much work in progress.

denisdefreyne commented 5 years ago

@Ecco Is there a chance that I can get hold of the source for the web site that you’re talking about? It’d help me in building a properly-parallelized Nanoc.

Ecco commented 5 years ago

Is there a chance that I can get hold of the source for the web site that you’re talking about

Unfortunately, not as is. But I guess I could make a minimum example that reproduces the exact issue we're running into. 99% of that website's sources aren't relevant to this anyway :)

denisdefreyne commented 5 years ago

@Ecco A minimal example would be quite useful!

Ecco commented 5 years ago

Here goes :)

nanoc-demo-mt.zip

denisdefreyne commented 5 years ago

Look at that:

% bundle exec nanoc
Loading site… done
Compiling site…
      create  [0.01s]  output/index.html
      create  [7.11s]  output/index/index.pdf
      create  [0.00s]  output/two/index.html
      create  [6.96s]  output/two/index.pdf
      create  [0.00s]  output/one/index.html
      create  [0.00s]  output/stylesheet.css
      create  [6.91s]  output/one/index.pdf
      create  [0.00s]  output/three/index.html
      create  [6.94s]  output/three/index.pdf

Site compiled in 27.95s.

% bundle exec nanoc
Loading site… done
Compiling site…
      create  [0.02s]  output/stylesheet.css
      create  [0.02s]  output/index.html
      create  [0.02s]  output/two/index.html
      create  [0.03s]  output/one/index.html
      create  [0.04s]  output/three/index.html
      create  [8.49s]  output/three/index.pdf
      create  [8.50s]  output/index/index.pdf
      create  [8.50s]  output/one/index.pdf
      create  [8.50s]  output/two/index.pdf

Site compiled in 8.53s.

denisdefreyne commented 5 years ago

There’s quite a bit more work to do, but the basic stuff is there.

I suppose I also need to start thinking about how to report all the recorded durations, since they don’t add up anymore.

Ecco commented 5 years ago

Oh, wow 😮 Color me impressed! I definitely need to take a look at that branch then 😄

As for the way durations are reported, I think the current output is actually great. It's rather clear what each output corresponds to, and they don't really need to add up.

denisdefreyne commented 5 years ago

Unfortunately, for sites that don’t run external processes, the parallel version of Nanoc is 5% to 10% slower. I’ll need to investigate, but it’s probable that the threading/locking/context-switching overhead is causing it.

Ecco commented 5 years ago

Hmm, that's a bummer. Maybe nanoc could use a flag like make -jN?

denisdefreyne commented 5 years ago

The slowdown is also noticeable when running with a single thread (using the new implementation).

I’ve measured lock contention, but it’s small (< 0.5%, don’t have more detailed results).

denisdefreyne commented 5 years ago

While still experimental, I think the PR is in a pretty good shape by now:

All existing tests pass, and it works correctly on the handful of Nanoc sites that I have here (both small and large ones).
After some refactoring (outside of the PR), the slowdown is not as large as anymore. There will always be some slowdown due to the extra overhead, but it doesn’t seem to exceed 5%, so I believe it’s fine.

The PR introduced quite a bit of code that is not yet thoroughly tested by unit tests and integration tests. I suppose that now is a good time to start working on that.

I think you can test out this branch for your own project, but do let me know when you run into unexpected behavior!

denisdefreyne commented 3 years ago

Ruby 3.0 opens up new possibilities here, via Ractors. I’d love to make use of this, though it would mean dropping support for Ruby 2.x. I think it’s too early for this, as Ruby 3.0 is quite new and Ruby 2.6 and 2.7 are still supported.

Ecco commented 3 years ago

I don't know if I'm biased, but I always use a ruby version manager. As a result, installing any version of Ruby is really not a concern for me. I would assume most Ruby devs also do, but I don't know 😄

dseomn commented 2 months ago

If people are still interested in this, would it make sense to reconsider it now? From https://www.ruby-lang.org/en/downloads/branches/ it looks like 2.7 is end-of-life. (I'd really like to switch to nanoc, but I've got about 2500 images to resize to multiple sizes each, and that seems like a good use of parallelism as long as the GIL/GVL is released when executing an external program.)

denisdefreyne commented 2 months ago

Unfortunately, even in the most recent Ruby version (3.3), ractors are still experimental and not usable in production-like settings. I wrote up some more detail on the lack of parallellism in Nanoc.

I am not sure where ractors are headed in the future, but I am keeping my eyes peeled.

There would be some benefit to using threads when using external processes (e.g. for resizing images), but I’d much prefer to use ractors, because that’d be far more impactful.

nanoc / features

Enable parallel builds #49

Steps to reproduce

Expected behavior

Actual behavior