nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
107.66k stars 29.63k forks source link

buffer.write performance issue #35386

Closed ledbit closed 3 years ago

ledbit commented 4 years ago

What steps will reproduce the bug?

I have not been able to create an isolated example outside of the application that reproduces the perf degradation - likely due to optimization.

When profiling our application (Cribl LogStream) we noticed that the top function call was a function called hidden - after some digging it turns out that the call trace is something like this

 Buffer.write
   validateInt32
     hideStackFrames
        hidden

image

after modifying the application to all the undocumented Buffer.utf8Write instead of Buffer.write we see about 20% overall improvement and the heavy bottom profile looks like follows - note during both times the application was profiled for same amount of time (30s).

image

I noticed the same performance improvement after updating hideStackFrames to look like this:

function hideStackFrames (fn) { return fn; } 

I have not been able to reproduce the perf degradation using a script that isolates just Buffer.write operations. I don't even see the hidden function calls at all during profiling. However, when I set a breakpoint in hideStackFrames and then start profiling I do end up seeing hidden in the profile - which make me think there's some optimization/compilation/inlinning issue at play.

UPDATE 9/28 I was able to repro the perf degradation by disabling inlining

buf$ node  buffer.js 
Buffer.write: 787.863ms
Buffer.utf8Write: 741.131ms

buf$ node --max-inlined-bytecode-size=0  buffer.js 
Buffer.write: 1233.917ms
Buffer.utf8Write: 781.519ms

here's how buffer.js looks like

const b = Buffer.alloc(1000*1000);
const str = '12345' + '67890';

console.time('Buffer.write');
let off = 0;
for(let i=0;i<1e7; i++) {
    off += b.write(str, off);
    if(off >= b.length) off = 0;
}
console.timeEnd('Buffer.write');

console.time('Buffer.utf8Write');
off = 0;
for(let i=0;i<1e7; i++) {
    off += b.utf8Write(str, off);
    if(off >= b.length) off = 0;
}
console.timeEnd('Buffer.utf8Write');

Could this mean that the default V8 inline settings are too conservative for the server side?

ledbit commented 4 years ago

Update:

  1. confirmed same performance issue is present in 14.13.0
  2. confirmed that using --stress-inline has no effect on perf

Any thoughts on what other options to try? For the short term, and since performance hit is so great we're calling utf8Write directly, but we'd rather not rely on a non-public API function

gireeshpunathil commented 4 years ago
BridgeAR commented 4 years ago

@nodejs/v8 is there a way to improve the inlining for cases like this one? And could the try ... finally block be the fault to prevent inlining the code?

ledbit commented 4 years ago

@BridgeAR - I tried removing try ... finally unfortunately it had no effect in performance.

const hideStackFrames = function(fn) {
  return function hidden(...args) {
    return fn(...args);
  }
}
joyeecheung commented 4 years ago

I wonder if hiding the validator frames is worth the performance cost (if I am reading the profiling results correctly)

puzpuzpuz commented 4 years ago

I wrote this benchmark. Its baseline result for latest master on my machine is:

$ ./node benchmark/misc/hidestackframes.js 
misc/hidestackframes.js n=100000 type="no-error": 20,704,220.948820617
misc/hidestackframes.js n=100000 type="error": 45,215.81646655569

With the following implementation

function hideStackFrames(fn) {
  return fn;
}

it shows:

$ ./node benchmark/misc/hidestackframes.js 
misc/hidestackframes.js n=100000 type="no-error": 48,594,383.16962696
misc/hidestackframes.js n=100000 type="error": 45,900.658629285084

So, there is a certain penalty introduced by hideStackFrames. I'm currently thinking of ways to move the penalty to the exception-path instead of success-path, while keeping the same behavior.

Update. I've tried re-running the benchmark and I can no longer see any difference, so please ignore this message. Disabling inlining with --max-inlined-bytecode-size=0 also doesn't make any difference, except for reducing results for both hideStackFrames implementations.

mmomtchev commented 4 years ago

In fact, this is significant enough only for the shortest and simplest library calls, @ledbit 's example uses 8 bytes buffers, but still, why not

ledbit commented 3 years ago

Seeing this on another Buffer code-path

image