puniverse / pulsar

Fibers, Channels and Actors for Clojure
http://docs.paralleluniverse.co/pulsar/
Other
911 stars 53 forks source link

Pulsar fibers vs. core.async `go` blocks #65

Closed alexandergunnarson closed 8 years ago

alexandergunnarson commented 8 years ago

I've done as extensive reading as I could find on this comparison (i.e., slim), but I still haven't found enough detail to satisfy my curiosity. My understanding is that fibers are instrumented in such a way to provide true "lightweight threads"/coroutines on the JVM, while go blocks are simply scheduled onto heavyweight threads in a lightweight way via an ExecutorService (really, a threadpool), but don't necessarily spawn new threads every time they're created. Both are able to be paused/parked, but do so in different ways. My understanding is based on this comment on StackOverflow from amalloy:

<!! ... would work fine if you don't mind having a Java thread sitting there doing nothing. ... [but] go blocks ... make logical processes much cheaper; to accomplish this, the go block rewrites the body of the block into a series of callbacks that are attached to the channel, so that internally a call to <! inside a go block gets turned into something like ... (take! c k) where k is a callback to the rest of the go block.

What is the performance advantage, if any, of fibers over core.async go blocks? Are there benchmarks available? What significant implementation differences have I missed? Intuitively, fibers seem more performant, but I may be quite wrong on that, especially given issue #64 that I just posted (part of which seeks clarification on the performance hit that automated instrumentation incurs).

Thanks for your help!

pron commented 8 years ago

A few things:

  1. Pulsar provides a compatible implementation of core.async w/o some of the limitation (i.e., no difference between ! and !! versions; ability to block down the stack rather than only in the top expression).
  2. Pulsar/Quasar and core.async's go blocks work in exactly the same way, by performing exactly the same instrumentation. Quasar does the instrumentation at the bytecode level while core.async does it at the language level (using macros), but the transformation is the same.
  3. Due to using the same algorithm, the performance impact is the same. However, there can be differences due to Quasar's use of ForkJoinPool as the default scheduler (although you can pick a ThreadPool, as core.async does). This gives better performance in richer configurations, but perhaps worse performance in very simple configuration, especially on older JDKs).
  4. Automatic instrumentation is an experimental feature; use it only if it significantly helps you.
alexandergunnarson commented 8 years ago

Thanks for your (again) detailed point-by-point reply! I really appreciate it!

I was aware of 1), but I didn't know about 2). That's really useful to know. I did the following rudimentary test to get a feel for some basic performance differences:

; CORE.ASYNC VS. PULSAR

(require
  '[criterium.core]
  '[clojure.core.async :as async]
  '[co.paralleluniverse.pulsar.async :as async+])

(def bench criterium.core/quick-bench)

(defn test-1
  ; From https://github.com/clojure/core.async/blob/master/examples/walkthrough.clj

  ; core.async
  ; With    Quasar agent, auto-instrumentation on:  188.272277 ms
  ; Without Quasar agent, auto-instrumentation off: 125-151 ms
  (bench
    (let [n 1000
          cs (repeatedly n async/chan)]
    (doseq [c cs] (async/go (async/>! c "hi")))
    (dotimes [i n]
      (let [[v c] (async/alts!! cs)]
        (assert (= "hi" v))))))

  ; Pulsar
  ; With    Quasar agent, auto-instrumentation on:  270.163299 ms ; 44%    slower than core.async
  ; Without Quasar agent, auto-instrumentation off: 188.913573 ms ; 24-50% slower than core.async
  (bench
    (let [n 1000
          cs (repeatedly n async+/chan)]
    (doseq [c cs] (async+/go (async+/>! c "hi")))
    (dotimes [i n]
      (let [[v c] (async+/alts!! cs)]
        (assert (= "hi" v)))))))

I should conclude from these benchmarks, then, from what you're saying, that ForkJoinPool is the performance killer here, not Pulsar or Quasar themselves. There was a blog post I read somewhere that said something to that effect — that ForkJoinPool is better than ThreadPool when there's coordination involved among the (light) threads, but ThreadPool is better when you spin up a bunch of new (light) threads, and/or when there isn't much coordination involved (i.e., in the case of this example).

Anyway, thanks for your comments and help!