Open H4ad opened 1 year ago
As discussed at https://github.com/libuv/libuv/issues/4098, the slowdown is not related to libuv
itself, for some reason it's slow to read data about actual frequency, even cat
can be slow:
time sudo cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
On my machine it took 600ms (32 cpu), on another machine it took less than 1ms (12 cpu), and on another, it took 1.6s (16 cpu), so this problem is very strange. If we call cat
multiple times it will return the fast value, so if you wait 2s to call it will get slow again.
I proposed to add a flag on uv_cpu_info
to choose which way we want to get the CPU frequency data, in theory, we can go back to the old behavior of getting the approximate frequency (and maybe we can introduce a new API to get the real one) but unless a from the libuv
maintainers reopen the issue, it will not be implemented.
In this case, unless we remove the call for uv_cpu_info
, Linux
machines can have this slowdown depending on many factors, not only CPUs.
This issue could mainly affect native bindings with NodeJS because they use getReport
to detect if the system is musl
, but I already created PRs to avoid using it on detect-libc
and napi-rs
, so at least for the most popular packages, it will not be an issue.
Just run npm install xxx
on a 384 cores machine and it stuck forever, and then found this issue.
The uv_cpu_info
is kept called and I don't know when will it stop.
root@6c92459bf81a:/# node a.js
Took 3862.274124ms
root@6c92459bf81a:/#
Oh, that's why I saw random slowdown/stuck in my machine, probably introduced by https://github.com/npm/npm-install-checks/commit/ebf9b9f4c08035d8e2c41cd9bc3302cd8bdc9184
/cc @nodejs/npm
We can use the implementation that I created for https://github.com/lovell/detect-libc/pull/19, is much faster and solve the issue with the slowdown caused by getReport.
I will probably open a PR soon (when I had more time to investigate and confirm the issue)
For anyone run into this, here's a workaround (if you are using glibc, e.g. Ubuntu, change to musl
for if you are using Alpine)
cat <<EOF >> /usr/lib/node_modules/npm/node_modules/npm-install-checks/lib/current-env.js
function libc(osname) {
return 'glibc';
}
EOF
npm install -g your-favorite-app
Related #55576 with fix #55602
100% sure this issue is with uv_cpu_info?
We do really need to add options to getReport so we can choose specifically which info we are looking for to avoid slowdowns
@Tofandel the network could affect but the main issue is probably the uv_cpu_info that we can't avoid without introducing a new flag, which I think could be the best scenario along with updating the npm code.
But I can later try to build nodejs in your branch to confirm.
Is it 384 core or logical processors? I'm guessing if you have 384 core you have double the logical processors and uv cpu info runs for each of the core synchronously, and the function might not be very fast
4 seconds for a debug function would be considered long but acceptable, but since it's not only used as a debug function, then instead of adding more and more flags to it, we just add options and if possible make it async
@Tofandel My bad, the SMT is off so only 192 real cores It's actually 1-way node with SMT on
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Vendor ID: AuthenticAMD
Model name: AMD EPYC 9654 96-Core Processor
CPU family: 25
Model: 17
Thread(s) per core: 2
Core(s) per socket: 96
Socket(s): 1
Stepping: 1
Frequency boost: enabled
CPU max MHz: 3707.8120
CPU min MHz: 1500.0000
BogoMIPS: 4800.18
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_go
od nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm c
mp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_
l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx51
2ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 clzero irperf xs
aveerptr rdpru wbnoinvd amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vg
if v_spec_ctrl avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid overflow_recov succor smca fsrm flus
h_l1d
Virtualization features:
Virtualization: AMD-V
Caches (sum of all):
L1d: 3 MiB (96 instances)
L1i: 3 MiB (96 instances)
L2: 96 MiB (96 instances)
L3: 384 MiB (12 instances)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-191
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Mitigation; safe RET
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Srbds: Not affected
Tsx async abort: Not affected
It seems its actually 96 real cores and 192 logical cores (1 socket with 96 core per socket * 2 threads per core)
Is it maybe an AMD specific issue? Would be nice to see if it's a common factor to see where to look
Hmmm you are right, thought it was a 2-way machine
It seems it's just randomly slow for me as well (8 real cores - Intel), so not just AMD
time sudo cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
4000009
3999975
4000000
4000000
4000031
3999999
4000040
4000113
real 0m0.010s
user 0m0.007s
sys 0m0.003s
time sudo cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq
4083445
4027384
4078306
4092856
4023428
4094010
4097951
4153294
real 0m0.166s
user 0m0.009s
sys 0m0.001s
It seems to depend on what the cpu is doing at the moment, if the cpu is at rest with a lot of idle time it becomes slow
Script:
On Node 20.2.0:
On node 20.3.0
I did a bisect and the commit that introduced this slowdown was https://github.com/nodejs/node/commit/bfcb3d1d9a876f399013d326bd65804f9eda77e4, which is the upgrade of
libuv
.But the actual problem was introduced (I think) at https://github.com/libuv/libuv/commit/434eb4b0ac29e853e85562119fda3fa681642f17, the slowdown is caused because instead of assuming the same information for all CPUs, now
libuv
read the/sys/devices/system/cpu/cpu%u/cpufreq/scaling_cur_freq
for every CPU.I'll also open an issue on
libuv
, but I want to keep it open just to track this slowdown.My machine:
cc @nodejs/performance