nodejs / node

Node.js JavaScript runtime ✨🐢🚀✨
https://nodejs.org
Other
105.55k stars 28.63k forks source link

IO_uring caused regression on node 20 and node 21 #51875

Open lucshi opened 4 months ago

lucshi commented 4 months ago

Version

v21.5.0

Platform

Linux wpe-icx 6.2.0-26-generic #26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Jul 13 16:27:29 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Subsystem

libuv, benchmark

What steps will reproduce the bug?

  1. Run benchmark/url/legacy-url-get-prop.js on node 16
  2. Run benchmark/url/legacy-url-get-prop.js on node 21 or node 20
  3. The bench has huge downgrade
bench                                       downgrade   
url/legacy-url-get-prop.js e=1 type='auth' *** -31.12 % 
url/legacy-url-get-prop.js e=1 type='dot' *** -24.22 % 
url/legacy-url-get-prop.js e=1 type='file' *** 10.24 % 
url/legacy-url-get-prop.js e=1 type='idn' *** -22.35 % 
url/legacy-url-get-prop.js e=1 type='javascript' * -3.92 % 
url/legacy-url-get-prop.js e=1 type='long' *** -24.71 %
url/legacy-url-get-prop.js e=1 type='percent' 3.46 % 
url/legacy-url-get-prop.js e=1 type='short' *** -21.85 % 
url/legacy-url-get-prop.js e=1 type='wpt' *** 14.60 % 
url/legacy-url-get-prop.js e=1 type='ws' *** -30.91 %

How often does it reproduce? Is there a required condition?

Always after libuv introduced IO_uring

What is the expected behavior? Why is that the expected behavior?

Expected not downgrade performance.

What do you see instead?

url/legacy-url-get-prop.js e=1 type='auth' -31.12 % ±5.13% ±6.82% ±8.88% url/legacy-url-get-prop.js e=1 type='dot' -24.22 % ±5.18% ±6.90% ±8.98% url/legacy-url-get-prop.js e=1 type='file' 10.24 % ±5.04% ±6.74% ±8.84% url/legacy-url-get-prop.js e=1 type='idn' -22.35 % ±3.64% ±4.87% ±6.38% url/legacy-url-get-prop.js e=1 type='javascript' * -3.92 % ±3.71% ±4.95% ±6.47% url/legacy-url-get-prop.js e=1 type='long' -24.71 % ±2.76% ±3.68% ±4.80% url/legacy-url-get-prop.js e=1 type='percent' 3.46 % ±3.64% ±4.84% ±6.30% url/legacy-url-get-prop.js e=1 type='short' -21.85 % ±2.40% ±3.19% ±4.15% url/legacy-url-get-prop.js e=1 type='wpt' 14.60 % ±6.32% ±8.43% ±11.00% url/legacy-url-get-prop.js e=1 type='ws' -30.91 % ±6.89% ±9.18% ±11.97%

Additional information

Performance counter stats for './node_21x node/benchmark/url/legacy-url-get-prop.js':

      1,034.49 msec task-clock                       #    1.726 CPUs utilized
           976      context-switches                 #  943.462 /sec
             5      cpu-migrations                   #    4.833 /sec
        49,576      page-faults                      #   47.923 K/sec
 3,083,797,172      cycles                           #    2.981 GHz
 9,467,384,792      instructions                     #    3.07  insn per cycle
 2,322,558,800      branches                         #    2.245 G/sec
    11,142,895      branch-misses                    #    0.48% of all branches
15,393,247,090      slots                            #   14.880 G/sec
 8,576,922,709      topdown-retiring                 #     54.3% Retiring
 2,043,750,331      topdown-bad-spec                 #     12.9% Bad Speculation
 3,800,218,341      topdown-fe-bound                 #     24.1% Frontend Bound
 1,377,534,362      topdown-be-bound                 #      8.7% Backend Bound

   0.599485294 seconds time elapsed

   0.434175000 seconds user
   0.605461000 seconds sys

Performance counter stats for './node_16x node/benchmark/url/legacy-url-get-prop.js':

        724.30 msec task-clock                       #    1.118 CPUs utilized
           536      context-switches                 #  740.030 /sec
             1      cpu-migrations                   #    1.381 /sec
        62,235      page-faults                      #   85.925 K/sec
 2,160,150,088      cycles                           #    2.982 GHz
 3,575,740,708      instructions                     #    1.66  insn per cycle
   644,209,048      branches                         #  889.429 M/sec
    18,904,357      branch-misses                    #    2.93% of all branches
10,799,731,125      slots                            #   14.911 G/sec
 3,462,464,898      topdown-retiring                 #     30.8% Retiring
 2,809,012,225      topdown-bad-spec                 #     25.0% Bad Speculation
 3,403,325,555      topdown-fe-bound                 #     30.3% Frontend Bound
 1,564,313,859      topdown-be-bound                 #     13.9% Backend Bound

   0.647607919 seconds time elapsed

   0.579179000 seconds user
   0.148602000 seconds sys

Basic uARC data shows node21 introduced 3X instructions compared with node16

Flamegraph of node 16 (normal) image

Flamegraph of node 21(downgrade) image

There are many network operations introduced in Node 21 which is time costy, please see below attachment raw files.

perf-kernel 16 ins perf-kernel 21 ins

lpinca commented 4 months ago

Isn't io_uring currently used only for file operations?

richardlau commented 4 months ago

Yes, io_uring support was only added for FS operations.

FWIW as of the 14 Feb security releases (Node.js 21.6.2, 20.11.1), io_uring support is turned off by default.

mcollina commented 4 months ago

The benchmark in question does not do any I/O: https://github.com/nodejs/node/blob/main/benchmark/url/legacy-url-get-prop.js

santigimeno commented 4 months ago

By looking at the graphs, my interpretation is that for every child process, a new SQPOLL-enabled ring is created which may cause a measurable perf hit. See discussion in https://github.com/libuv/libuv/issues/4308 and mitigation measures in https://github.com/libuv/libuv/pull/4315. (This last fix hasn't made to a libuv release yet)