oracle / graaljs

A high-performance, ECMAScript compliant, and embeddable JavaScript runtime for Java
https://www.graalvm.org/javascript/
Universal Permissive License v1.0
1.8k stars 190 forks source link

Node.js vs Graal.js Performance #74

Open weixingsun opened 5 years ago

weixingsun commented 5 years ago

Dude,

I came across GraalVM and have a glance at the JVM options part, and thought it is promising, But I found that the performance is much lower than latest Node.js, here is the result: https://github.com/weixingsun/perf_tuning_results/blob/master/Node.js%20vs.%20GraalVM

Any idea about the difference?

wirthi commented 5 years ago

Hi @weixingsun

thanks for your question. I am trying to understand what your benchmark script (test_graal.sh) is doing. It obviously does something, and terminates after ~130 seconds on my machine, but CPU utilization is <1% most of the time, so that does not look like a reasonable benchmark to me.

I can execute the application itself (node application.js) and benchmark it with a tool like wrk, that really stresses the fib calculation. With that I get the following numbers:

On that benchmark, GraalVM even outperforms Node. But note that your fib calculation blocks the event loop, so you can only do one calculation at a time, serve only one request at a time (you usually want to avoid exactly that when using Node.js) - all requests are serialized and calculated one after the other. So you are measuring hardly any Node.js/express code - this benchmark almost exclusively measures core JavaScript via the fibonacci calculation (for a 30 sec benchmark, only 28 iterations are run through Node.js/express; the time is spent in the fibonacci function - which is fine, if you want to measure pure Javascript core performance).

I am using wrk -t5 -c10 -d30s http://localhost:8080/fib to measure (that's my typical Node.js benchmark setting; using 5 threads and 10 connections is actually overkill on this serialized benchmark, as stated above).

Can you please help me understand what you try to measure with the test_graal.sh script? Maybe I am missing something.

Best, Christian

weixingsun commented 5 years ago

@wirthi thanks for your reply, I just want to saturate a certain core in my server. the main workload of 2 simple get methods: fib/fast as an iteration in parallel. By using this method, I can easily see how long time the 100 continuous iterations take.

which I can see they occupied 100% user cycles, which means I created a bottleneck on cpu3: [root@dr1 cpu_bond]# mpstat -P 3 3 3 Linux 3.10.0-862.11.6.el7.x86_64 (dr1) 11/15/2018 _x8664 (112 CPU)

07:27:43 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle 07:27:46 PM 3 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:27:49 PM 3 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 07:27:52 PM 3 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Average: 3 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

weixingsun commented 5 years ago

oops, test_graal.sh is creating bottleneck on cpu 2, log above is for test_v8.sh

woess commented 5 years ago

We have two execution modes, "native" (default) and "JVM" (see https://www.graalvm.org/docs/reference-manual/languages/js/ for more information). Setting jvm options switches to the JVM. Currently, fibonacci is significantly faster in native mode, try running without jvm options.

weixingsun commented 5 years ago

@woess Thanks for explaining the modes, but I got 186.197s after removing all the jvm options. what vm is underneath? Nashorn or GraalVM?

perf record gave me following stacktraces: Samples: 1K of event 'cycles:ppp', Event count (approx.): 53465728779 Overhead Command Shared Object Symbol 2.00% node perf-29614.map [.] 0x00007fd3e71580cb 1.55% node libpolyglot.so [.] com.oracle.truffle.js.nodes.function.FunctionBodyNode.execute(com.oracle.truffle.api.frame.VirtualFrame)java.lang.Object 1.42% node libpolyglot.so [.] com.oracle.svm.core.genscavenge.GCImpl.blackenBootImageRoots()void 1.21% node perf-29614.map [.] 0x00007fd3e7158242 1.20% node perf-29614.map [.] 0x00007fd3e71588ec 1.14% node perf-29614.map [.] 0x00007fd3e7158000 0.99% node perf-29614.map [.] 0x00007fd3e71583f0 0.93% node perf-29614.map [.] 0x00007fd3e715859d 0.92% node perf-29614.map [.] 0x00007fd3e7158007 0.82% pilerThread-156 libpolyglot.so [.] org.graalvm.collections.EconomicMapImpl.grow()void ......

EdwardDrapkin commented 5 years ago

I was curious as well, so I figured I could provide a real life benchmark, running a webpack build. This was entirely unscientific and the tests were only run once.

The results were surprising. Here are the relevant files: https://gist.github.com/EdwardDrapkin/d1b380787821462c5677323614f20146

The results wound up:

Node 11:

real    0m3.361s
user    0m4.747s
sys 0m0.396s

Graal native:

real    1m18.097s
user    2m51.988s
sys 0m13.533s

Graal JVM:

real    1m5.169s
user    5m21.155s
sys 0m4.549s

Graal JVM with --jvm.XX:+UseG1GC:

real    1m13.938s
user    6m37.463s
sys 0m4.333s
wirthi commented 5 years ago

Hi @EdwardDrapkin

thanks for sharing your benchmark. I am no expert on webpack - I guess the modules/pp3/ is the actual thing you pack? You didn't provide that in your gist?

Note that, unlike the peak-performance benchmark weixingsun posted above, your's is heavy on startup - it's a one-time executed tool. If even original Node finishes that in 3 seconds, Graal-Node.js will have a hard time of keeping up with that. Graal-Node.js requires more time to JIT-compile the source code it gets. This makes it slower on workloads like npm, webpack or similar - anything that runs only for a short time, and only once. However, a factor of >20 as you experience it is more than we usually see.

If I could reproduce your run fully, I'd love to look into it and see if there is anything we can optimize for.

Best, Christian

EdwardDrapkin commented 5 years ago

I can't provide the actual source code we use at work, but it's a fairly straightforward React project. You'd get similar results if you copied any react project in there. I will note that I switched the TS language service in IntelliJ to use GraalVM instead of NodeJS, and while it's exceptionally painful for a good long while, after about an hour it feels faster but AFAIK there's no way to benchmark proprietary IntelliJ plugins.

i-void commented 5 years ago

Create a simple Nuxt.js project with selecting yarn package manager as default. And run yarn run dev, you simply don't need any benchmark results. Graal is slower 3min or more for a simple build. For complex projects over 350 modules difference goes up to 10-15min just for build. This is not in an acceptable range to use this. Also it gives errors and cannot start.

re-thc commented 5 years ago

Is startup performance not going to be considered? Having to run both graaljs and nodejs in parallel is going to be confusing. I thought the point of graal was to have 1 tool that does it all and have the interop?

wirthi commented 5 years ago

Hi @hc-codersatlas

we are currently working on significant startup improvements by AOT-compiling larger parts of the Node.js codebase. This is a significant engineering effort though, so it takes a while.

Best, Christian

4ntoine commented 5 years ago

Hey, i'm also interested in it.

I've just measured node.js vs graalvm's node performance and the latter is 10-20x slower. Any possible reason or optimizations turned off? I think i will be able to provide the sources for benchmarking or do proper benchmarking (for now just replaced calls of node to graalvm's node without any additional arguments).

graalvm-ee-19.1.1 node.js v8.9.0

thomaswue commented 5 years ago

Is this for startup or peak performance? Can you share the workload as suggested?

4ntoine commented 5 years ago

Hi, Thomas. Thanks for reply.

It's rough time of execution in millis of exactly the same code on node and graalvm's node (startup time excluded from measurement). It includes processing of stdin, parsing (string + regexp operations mostly), objects instantiating and calling object methods with some business logics.

I think i will be able to provide the code, will doublecheck it.

4ntoine commented 5 years ago

graalvm-bechmark.zip

Just run run.sh with node on PATH:

./run.sh

or graalvm node on PATH, eg:

PATH=/Users/asmirnov/Documents/dev/src/graalvm-ee-19.1.1/Contents/Home/bin:$PATH ./run.sh

It will clone required JS code, prepare data and run benchmarking, see actual execution time. Let me know if you need any assistance or find the rootcause

wirthi commented 5 years ago

Hi @4ntoine

thanks for your code. I confirm we can execute it and measure performance.

Your benchmark does not consider warm-up. To mitigate that, you can put a loop around the core of your benchmark (lines 153ff in benchmark.js) and measure each iteration independently - however, that might not give exact results due to caching in the code. We are working on some micro-benchmarks to better measure the performance. But it seems we are within 2.5X of origin Node if you account for the warmup.

Also, note that running in JVM mode (node --jvm benchmark.js) gives a better peak performance than in native mode.

We'll get back to you once we know more. Also, improving our warmup performance is high up our list, so that should get better over the next releases.

Best, Christian

4ntoine commented 5 years ago

Hey.

Thanks for the update.

caching in the code

Yup, there is some caching and i can modify it to avoid side effect of caching for better benchmarking.

But it seems we are within 2.5X of origin Node if you account for the warmup.

Does it mean you target to have 2.5x worse performance compared to Node?

thomaswue commented 5 years ago

Our target is to be at least comparable speed or better for any workload. This is a longterm target however and we aren't there yet for Node.js applications.

In-line commented 4 years ago

Running graal/bin/node yarn start in React project is significantly slower, than with stock NodeJS.

It takes around 15minutes to start compiling and I didn't wait after that.

Stock node does that in around ~2 minutes.

frank-dspeed commented 4 years ago

I think it should maybe get documented that node-graalvm is not as optimized at present as it could be and that at present the startup time is higher and the performance is slower for none long-running processes. so that not everyone is shocked.

thomaswue commented 4 years ago

Agreed that we should put the information on startup into the documentation. On peak performance it is not so clear as there are also workloads where we are faster.

Ivan-Kouznetsov commented 4 years ago

I note that there are cases where GraalVM Node.js performs slower than Node.js after many iterations of the same task, which does not appear to be caused by start up time. I created a repo that illustrates that GraalVM performs slower than Node.js at:

  1. Regex (1000 iterations of regex-redux task)
  2. JSONPath queries (1000 iterations using 2 different libraries)
  3. HTTP GET requests (10,000 iterations using lightweight library)

I hope you will find it useful: https://github.com/Ivan-Kouznetsov/graalvm-perf

frank-dspeed commented 4 years ago

@ivan you need to calculate that right NodeJS will most time be faster in that cases but when you replace Regex with the Regex from java and the JSONParser and Query Element with the Java one and the HTTPGet Method with that from Java you outperform NodeJS by Far.

4ntoine commented 4 years ago

@thomaswue

Our target is to be at least comparable speed or better for any workload. This is a longterm target however and we aren't there yet for Node.js applications.

Are we there at the moment? Any benchmarks/comparisons available? Thanks

frank-dspeed commented 4 years ago

@4ntoine the state is still the same everything that uses nodejs modules from node-graaljs is slower

if you use Only Java or Javascript it is faster.

wirthi commented 3 years ago

Hi @Ivan-Kouznetsov

thanks for your benchmarks, they provide relevant insight! And they show the fundamental misconception, which is

after many iterations of the same task

1000 iterations of something is not "many" in the JIT world. As per your documentation (original) Node.js would need 0.120s for the full jsonpath-classic-benchmark.js. GraalVM is in the Java world - and there, it would take a few hundred milliseconds to even start the JVM, let alone execute the benchmark. Thanks to native-image, we can be faster on GraalVM, but the same basic principle still applies: we need to JIT-compile the code, and that won't fully happen within 120 milliseconds.

I've hacked some proper warmup into your benchmark, like this (e.g. for jsonpath-classic-benchmark.js):

const jsonPath = require('./lib/jsonPath');
const n = process.argv[2] || 10000;

function test() {
  const sampleObj = {name:"john",job:{title:"developer", payscale:3}};
  var len=0;
  for(let i=0;i<n;i++){
    len += jsonPath(sampleObj,"$..name").toString().length;
    len += jsonPath(sampleObj,"$..payscale").toString().length;
    len += jsonPath(sampleObj,"$..age").toString().length;
  }
  return len;
}

var i=0;
while (true) {
  var start = Date.now();
  console.log(test());
  console.log(++i+" = "+(Date.now()-start)+" ms");
}

Basically, I am executing your full benchmark repeatedly, and print out how long each iteration takes:

GraalVM EE 20.3.0

$ node jsonpath-classic-benchmark.js 
100000
1 = 2485 ms
100000
2 = 2267 ms
100000
3 = 427 ms
100000
4 = 209 ms
100000
5 = 185 ms
100000
6 = 148 ms
100000
7 = 177 ms
100000
8 = 143 ms
100000
9 = 146 ms

compared to Node.js 12.18.0

$ ~/software/node-v12.18.0-linux-x64/bin/node jsonpath-classic-benchmark.js 
100000
1 = 253 ms
100000
2 = 221 ms
100000
3 = 217 ms
100000
4 = 197 ms
100000
5 = 199 ms
100000
6 = 208 ms
100000
7 = 207 ms
100000
8 = 197 ms
100000
9 = 213 ms

Admitted, GraalVM's first 2 iterations are horrible. Iterations 3 and 4 are in the ballpark of V8. Starting with iteration 5, GraalVM is actually significantly (around 25%) faster than V8.

There's one more trick up our sleeve. In --jvm mode, the first iterations are even slower, and it takes longer to reach a good score. But after ~20 iterations, we are down to around 60ms per iterations, meaning GraalVM in JVM mode takes 0.3x the time of V8 per iteration.

On the jsonpath-new-benchmark.js, GraalVM and V8 are roughly on par.

On regexp-benchmark.js, our engine is around 3-4x behind. Will complain with our RegExp guy to optimize this pattern :-)

Best, Christian

frank-dspeed commented 2 years ago

@wirthi https://github.com/oracle/graaljs/issues/360#issuecomment-1129109834 maybe makes this obsolet as this performance degradations are now less a problem then before. Even npm is now not freezing anymore the string update is a hugh one combined with the new default boot mode

Osiris-Team commented 1 year ago

It would be cool if the jvm cached the generated binary from each class so that the warmup only would happen once and not on each program restart.