Open ggascoigne opened 2 years ago
BTW I realise that without a project to test this on this will be difficult to do anything with, I'm more than happy to jump through hoops to try and get you any further information that can help pin this down if it comes to that.
Building the open source Ace (Ajax.org Cloud9 Editor) also exhibits the problem. Steps to reproduce on an arm64 Mac:
git clone https://github.com/ajaxorg/ace
cd ace
npm install
make clean && node Makefile.dryice.js normal --m --nc
I can't test locally but I know the cause: V8 on arm64 assumes there's at most 864 kb of stack space (984 kb on other architectures.)
Check what ulimit -s
prints and start node with node --stack_size=<kb> app.js
. Important: if you set it too high, you risk crashes.
@bnoordhuis can we close this issue or there's something Node.js can do?
Node should probably check getrlimit(RLIMIT_STACK)
at start-up.
864 kb (kilobits)? This is highly unusual, stack is normally 4096 or 8192 KiB, in most settings.
I do have a public reproducer for you, first reported in Debian. I can reproduce this on the Debian/arm64 porterbox:
Linux amdahl 5.10.0-21-arm64 #1 SMP Debian 5.10.162-1 (2023-01-21) aarch64 GNU/Linux
processor : 0
BogoMIPS : 100.00
Features : fp asimd evtstrm cpuid
CPU implementer : 0x50
CPU architecture: 8
CPU variant : 0x0
CPU part : 0x000
CPU revision : 1
Here’s it using the latest nodejs release (though I’m still using Babel etc. from Debian because that’s quicker for me right now than using npm):
(sid_arm64-dchroot)tg@amdahl:~$ wget https://github.com/danvk/dygraphs/archive/refs/tags/v2.2.0.tar.gz
(sid_arm64-dchroot)tg@amdahl:~$ tar xaf v2.2.0.tar.gz
(sid_arm64-dchroot)tg@amdahl:~$ cd dygraphs-2.2.0
(sid_arm64-dchroot)tg@amdahl:~/dygraphs-2.2.0$ env NODE_PATH=/usr/share/node_modules ../node-v19.6.0-linux-arm64/bin/node /usr/bin/babeljs --config-file $PWD/babel.config.json --compact false --source-maps inline -d tests5 auto_tests
RangeError: Maximum call stack size exceeded
at isBlockStatement (/usr/share/nodejs/@babel/types/lib/validators/generated/index.js:381:26)
at isScope (/usr/share/nodejs/@babel/types/lib/validators/isScope.js:9:39)
at NodePath.isScope (/usr/share/nodejs/@babel/traverse/lib/path/lib/virtual-types-validator.js:107:10)
at NodePath.getScope (/usr/share/nodejs/@babel/traverse/lib/path/index.js:85:17)
at NodePath.setScope (/usr/share/nodejs/@babel/traverse/lib/path/context.js:115:21)
at NodePath.setContext (/usr/share/nodejs/@babel/traverse/lib/path/context.js:128:8)
at NodePath.pushContext (/usr/share/nodejs/@babel/traverse/lib/path/context.js:183:8)
at TraversalContext.visitQueue (/usr/share/nodejs/@babel/traverse/lib/context.js:78:14)
at TraversalContext.visitSingle (/usr/share/nodejs/@babel/traverse/lib/context.js:65:19)
at TraversalContext.visit (/usr/share/nodejs/@babel/traverse/lib/context.js:109:19)
(sid_arm64-dchroot)tg@amdahl:~/dygraphs-2.2.0$ env NODE_PATH=/usr/share/node_modules ../node-v19.6.0-linux-arm64/bin/node --stack-size="$(ulimit -s)" /usr/bin/babeljs --config-file $PWD/babel.config.json --compact false --source-maps inline -d tests5 auto_tests
Successfully compiled 59 files with Babel (9209ms).
An npm i
after the cd
should do the trick for you as well.
This probably affects many; I’ve seen reports of people trying to use some äpps on Android on arm64 running into this bug. I found it during Debian reproducible-builds testing. Others run into this because they use arm64 machines for development.
Hi James,
(you might wish to Cc @.***> so they actually get the reply…)
Are you able to determine whether https://github.com/nodejs/node/issues/41163 (and/or any of the guidance within that thread) seems relevant to this bug?
It appears so. I commented there, thank you for finding that link.
I guess there is even a… quick patch… to make from this. I admit I’m very confused by that statement:
“if you set it too high, you risk crashes”
That can’t be right.
Searching through the nodejs source for where this stack size is set, I see multiple time bombs for all architectures.
deps/v8/src/common/globals.h does set the default stack size to 864/984 KiB in order to achieve an about 1 MiB stack for JS plus C++ parts combined.
I wonder if this shouldn’t be getrlimit(RLIMIT_STACK) - overhead, and then define the per-architecture overhead in a suitable way.
That lower 1 MiB total limit seems to be for Windows. The lower arm64 limit is caused by “allocating MacroAssembler takes 120 [KiB]” but the total could still be raised I think… at least on unixoid platforms other than WebView-on-Android. Since the location of these defaults is in v8, it also applies for browsers and whatnot, but nodejs could indeed inspect the current ulimit and set a better default for at least nōn-Windows systems.
I’m not, unfortunately, in the position to provide a quick patch, being a C developer, not CFrustFrust, and all that. I think that InitializeNodeWithArgs in src/node.cc, which already has a call to V8::SetFlagsFromString(NODE_V8_OPTIONS, …), is the likely place for adding code (suitably platform-ifdef’d) that does:
Might need to adjust some tests, too :~
Good luck, //mirabilos --
ooooh and I think I get why “risk crashes” is mentioned here: setting --stack_size="$(ulimit -s)"
omits the subtraction of the per-arch CFrustFrust overhead, so that’s not a consideration for the proposed way to fix this
“if you set it too high, you risk crashes”
That can’t be right.
--stack_size=...
doesn't set the stack size, it just tells V8 to assume it's this big. Perennial source of confusion.
Ben Noordhuis dixit:
--stack_size=...
doesn't set the stack size, it just tells V8 to assume it's this big. Perennial source of confusion.
Aieee.
OK, then setting it to the ulimit is probably right?
Eh I’ll leave wondering about details to the devs.
bye, //mirabilos --
Why don't you use JavaScript? I also don't like enabling JavaScript in Because I use lynx as browser. +1 -- Octavio Alvarez, me and ⡍⠁⠗⠊⠕ (Mario Lang) on debian-devel
James Addison dixit:
Maybe it's rare to propose 'do nothing' as a technical suggestion but I think it is worth considering here, since we are not the arbiters of Node.
It’s still a release-critical bug in Debian which impacts arm64 builders including reproducible-builds. I would see this fixed in bookworm, at least by a band-aid (raising these limits by default); the proper fix can be revisited after the release, and in coordination with upstream.
We know the default ulimits for users in Debian, and they are higher than the 1 MiB assumed by v8, by quite some factor, so this won’t break things which are not currently broken (by that exception). This will do for the release I think.
If a proper, upstream-supported, fix arrives within $time it’s even possible that a bookworm-security update includes that. Still thinking along the line of getrlimit + subtract known arch-specific value for the fix here, whatever makes sense with the v8 setting anyway…
bye, //mirabilos -- Gestern Nacht ist mein IRC-Netzwerk explodiert. Ich hatte nicht damit gerechnet, darum bin ich blutverschmiert… wer konnte ahnen, daß SIE so reagier’n… gestern Nacht ist mein IRC-Netzwerk explodiert~~~ (as of 2021-06-15 The MirOS Project temporarily reconvenes on OFTC)
Package: nodejs Followup-For: Bug #1030284 X-Debbugs-Cc: @., @.
mirabilos gesagt:
We know the default ulimits for users in Debian, and they are higher than the 1 MiB assumed by v8, by quite some factor, so this won’t break things which are not currently broken (by that exception). This will do for the release I think.
Relaying my understanding of this, so far:
An increase in the V8 stack size should not cause earlier-process-exits for any processes that previously ran on Debian systems where the architecture-default-or-greater stack size is configured[1].
In other words: the same-number-or-greater of JavaScript processes should continue to run on any given Debian system where the configured stack size is greater-than-or-equal to the architecture's default, after the V8 stack size limit is increased.
And we expect that it should repair a failing reproducible build test for at least one Debian package on arm64.
[1] - see limits.conf
That comment was written and sent by me. I didn't realize that the GitHub reply-to email address is user-specific and that messages sent to it will appear with them as the author.
Patches are welcomed downstream by the Debian maintainer (see https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1030284), and there is a plan to re-build and re-test affected Debian packages with any selected patch(es) before the hard freeze for the upcoming bookworm
release (that freeze is likely to occur on or around 2023-03-12).
I've offered an untested and not-yet-accepted patch -- a mitigation, not a fix -- to Debian, that removes the preprocessor-defined smaller stack size for ARM-based architectures.
That patch is also available as pull request #46896 for this repository (currently opened against the main
branch of Node).
The patch maintains the existing static-constant definition of stack size limits (that is to say: it does not introduce dynamic rlimit-based querying), since I'm not particularly familiar with the details of C, V8, NodeJS, or stack limits.
The following bugs are referenced from comments in the section of code that the patch removes:
@jayaddison
(sid_arm64-dchroot)kapouer@amdahl:~/test/dygraphs-2.2.0$ NODE_PATH=/usr/share/node_modules ../../nodejs-18.13.0+dfsg1/node /usr/bin/babeljs --config-file $PWD/babel.config.json --compact false --source-maps inline -d tests5 auto_tests
Successfully compiled 59 files with Babel (10442ms).
your patch works, thank you.
@kapouer you're welcome, thanks for testing it
https://crbug.com/405338 (I haven't read the contents of this bug yet)
A request for help: has anyone else been able to read the contents of this bug and to determine whether they remain relevant for ARM and Node? (I don't seem to be able to read it)
FWIW, I have (mildly) elevated privileges but I can't read it either.
Thanks for checking @bnoordhuis.
Although information from that bug might be useful, it turns out that there is some relevant public changeset and code review information available.
https://codereview.chromium.org/555943003 - a V8 changeset for ARM - was initially proposed as a change to reduce the size of some allocated buffers.
It was closed with a preference for the easier and safer https://codereview.chromium.org/583163002/ approach instead -- introducing the reduced stack size on ARM.
Increasing the stack size again (as #46896 proposes) may fix the issue described in this thread, but would also be likely to re-introduce whatever bug those two changesets were aiming to resolve.
I'll try to formulate a suitable question to post to a V8 contributors mailing list with a summary of the situation.
James Addison dixit:
Increasing the stack size again (as #46896 proposes) may fix the issue described in this thread, but would also be likely to re-introduce whatever bug those two changesets were aiming to resolve.
Probably — iff the bug in question applies to the nodejs use case at all in the first place.
Best to ask them what bug, and where it applies to, as well.
Thanks for your persistence, //mirabilos -- Solange man keine schmutzigen Tricks macht, und ich meine wirklich schmutzige Tricks, wie bei einer doppelt verketteten Liste beide Pointer XORen und in nur einem Word speichern, funktioniert Boehm ganz hervorragend. -- Andreas Bogk über boehm-gc in d.a.s.r
I'm having difficulty posting to the v8-dev
mailing list at the moment; if anyone's able to post a message there on my behalf (or suggest changes to this one), here's what I'd like to send:
Subject: Question: stack size on ARM systems
Hi folks,
Debian bug #1030284[1] and the related NodeJS GitHub issue #41163[2]
report ARM-specific RangeError exceptions from the vendored V8 library
in NodeJS. The bug(s) are reproducible with v18.13 of NodeJS.
The cause of the difference-in-behaviour appears to be that V8 sets[3]
a lower stack size for ARM: 864K as compared to 984K.
Would it be safe to increase the stack size on ARM to 984K to restore
consistency with most other architectures?
(I've offered a patch to make that change, and others have confirmed
that it allows a provided repro case to pass; I'm worried about any
potential unsafe side-effects of the change, though)
Thanks,
James
[1] - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1030284
[2] - https://github.com/nodejs/node/issues/41163
[3] - https://github.com/nodejs/node/blob/2bb4b59fa5529569ad38d3bf7d36666c926d8e47/deps/v8/src/common/globals.h#L74-L86
(an earlier draft included references the codereview change history, bug reports, and so on -- after a while, though, it seemed like a more concise message might be more respectful of people's time)
Edit: add missing third footnote
v8-dev is moderated. If you're a first-time poster, it can take up to a day for your post to get accepted. Its moderators predominantly follow CET working hours.
In short summary from a helpful response on the v8-dev
mailing list: no, it isn't currently considered safe to restore the 984KB stack size on ARM64 (it may be OK on 32-bit ARM, but not 64-bit). Also: if the RangeError
condition is caused by recursive code, then it could be worthwhile to refactor that code into a non-recursive implementation.
James Addison dixit:
In short summary from a helpful response on the
v8-dev
mailing list: no, it isn't currently considered safe to restore the 984KB stack size on ARM64 (it may be OK on 32-bit ARM, but not 64-bit).
In which environments and with which rationale?
Given the default ulimit on Debian is many times the mentioned 1 MiB from the source code comment…
Also: if the
RangeError
condition is caused by recursive code, then it could be worthwhile to refactor that code into a non-recursive implementation.
I wouldn’t know, ask the babel7 upstream developers. This is not something that can be done for bookworm though, so in Debian, we need to achieve at least parity with the other architectures and that in a way that doesn’t break anything currently in the tree.
bye, //mirabilos -- 15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)
Thanks @mirabilos - I'm not likely to look at this issue again for at least a week; please see (and/or join, if you like) the v8-dev
mailing list for the recent context.
The suggestion made in https://groups.google.com/g/v8-dev/c/7ZI3vxtabcU/m/c9qvHkOBBAAJ was to float a patch in node but I don't think we want to be in the business of second-guessing V8. As a general rule, we don't float patches that haven't been accepted upstream.
If downstream Debian wants to float a patch, that's fine, of course.
@laverdet is isolated-vm affected when node sets the --stack_size=...
flag to some at-runtime-determined value? Effectively V8 will assume a 2-4 MB system stack instead of the 800-900k it defaults to today.
Node's own worker_threads module uses 4 MB thread stacks but I couldn't quite divine what isolated-vm does.
isolated-vm instantiates threads with the built-in C++11 std::thread
constructor, so pthreads on macOS. I get the stack base using platform-specific invocations: https://github.com/laverdet/isolated-vm/blob/main/src/isolate/environment.cc#L27-L48 and then feed that to v8 minus 24kb of padding [experimentally derived] https://github.com/laverdet/isolated-vm/blob/main/src/isolate/environment.cc#L239-L247 [the comment here says 512kb but I just tested again and they increased it to 524kb at some point]. The implementation for GetStackBase
in isolated-vm doesn't do anything in Windows because the stack sizes on that platform are more than enough to support v8's defaults.
Ah okay, so isolated-vm uses v8::Isolate::SetStackLimit()
and default thread stacks. Understood, thanks.
That means runtime tuning of --stack_size=...
has to be conservative, so < 2 MB on linux and macos and $deity knows how little with musl libc.
Honestly, tweaking the flag seems fraught with peril. I'm kind of loath to make the change.
Looping back some discussion from downstream in Debian - particularly some comments by @mirabilos around the fact that V8 accepts stack-size options on the command-line:
Would it make sense to make a call to getrlimit(RLIMIT_STACK)
from somewhere around (probably just after) this code block: https://github.com/nodejs/node/blob/2bb4b59fa5529569ad38d3bf7d36666c926d8e47/src/node.cc#L781-L786
... and then to set the V8 --stack-size
option when a non-infinite limit is successfully found?
(I have written a sketchy draft of a patch to do that, but not yet built or tested it yet, and would welcome any feedback on whether it's a sensible approach in the first place)
I've had a change of heart and do not wish to participate further with this bug, and will unsubscribe. Please feel free to make use of any of the patches I've suggested downstream in Debian, but do not merge them as-is (apply corrections first). Thank you.
Would it make sense to make a call to getrlimit(RLIMIT_STACK)
https://github.com/nodejs/node/issues/41163#issuecomment-1467889659 - i.e., that won't work because of threads. RLIMIT_STACK only applies to the main thread.
James Addison dixit:
... to add something like this:
Ouch, by going via a string?! I wouldn’t have thought of that…
if (!(flags & ProcessInitializationFlags::kNoAdjustResourceLimits)) { struct rlimit lim; if (getrlimit(RLIMIT_STACK, &lim) == 0) { char stackSize[32];
32 is magic and may also be wrong here.
int buflen = snprintf(stackSize, sizeof(stackSize),
"--stack-size=%d", lim.rlim_cur);
%d isn’t right for lim.rlim_cur, which is of type rlim_t. --stack-size seems to take in KiB, and we’d want a reserve.
if (buflen < sizeof(stackSize)) { V8::SetFlagsFromString(stackSize, buflen); }
} }
So, taking your next post into account, probably something more like this:
if (!(flags & ProcessInitializationFlags::kNoAdjustResourceLimits)) { struct rlimit lim; char stackSize[sizeof("--stack-size=") + / 2³¹ / 10];
if (getrlimit(RLIMIT_STACK, &lim))
fprintf(stderr, "W: stack size adjustment: cannot get RLIMIT_STACK\n");
else if (lim.rlim_cur == RLIM_INFINITY)
fprintf(stderr, "W: stack size adjustment: RLIMIT_STACK is unlimited\n");
else if ((lim.rlim_cur /= 1024) <= V8_STACK_RESERVE)
fprintf(stderr, "W: stack size adjustment: RLIMIT_STACK is too small\n");
else if ((lim.rlim_cur -= V8_STACK_RESERVE) >= /* 2³¹ */ 2147483647)
fprintf(stderr, "W: stack size adjustment: RLIMIT_STACK is too large\n");
else
V8::SetFlagsFromString(stackSize, snprintf(stackSize, sizeof(stackSize),
"--stack-size=%d", (int)lim.rlim_cur));
}
Untested, still not back to full health, take with grains of salt.
bye, //mirabilos -- In traditional syntax ' is ignored, but in c99 everything between two ' is handled as character constant. Therefore you cannot use ' in a preproces- sing file in c99 mode. -- Ragge No faith left in ISO C99, undefined behaviour, etc.
Ben Noordhuis dixit:
Would it make sense to make a call to getrlimit(RLIMIT_STACK)
https://github.com/nodejs/node/issues/41163#issuecomment-1467889659 - i.e., that won't work because of threads. RLIMIT_STACK only applies to the main thread.
Hmm, but thread stack size can also be configured.
So the patch needs to ensure that the thread stack size is sufficiently large and then configure v8 based on that. Hm.
Is there anyone who understands all of this?
So, there are a couple of things node can do here.
For threads that node manages itself (including the main thread), it knows approximately how big the stack is and it can inform V8 about that through the v8::ResourceConstraints::set_stack_limit()
API.
For threads that are not under node's control, we either do nothing or do something very conservative. I mean, you can try to probe the stack but that turns into a combinatorial explosion of platform- and architecture-specific code quickly.
Apropos the --stack_size
switch, that's a per process setting, not per thread. I'd just leave it untouched. If people want to override the defaults, they presumably / hopefully know what they're doing.
Version
v16.13.1
Platform
Darwin USAGPM.local 21.2.0 Darwin Kernel Version 21.2.0: Sun Nov 28 20:28:41 PST 2021; root:xnu-8019.61.5~1/RELEASE_ARM64_T6000 arm64
Subsystem
No response
What steps will reproduce the bug?
I have a project with about 1000 typescript files (it's private so I can't point you at it, sorry). It's a react project that uses create-react app.
The arm version of v16 gives:
The x86 version works fine. I've tried every arm version of v16 that I can find, all have the same issue, every x86 version works with no problem. Running
yarn tsc
explicitly works fine with both the arm and the x86 version.How often does it reproduce? Is there a required condition?
With the affected project it happens 100% of the time, with other projects I can't reproduce this.
What is the expected behavior?
x86 and arm behavior should be identical.
What do you see instead?
No response
Additional information
No response