Open jussikan opened 10 years ago
I'm fairly certain that this specific CPU doesn't implement the FPU instructions that the V8 runtime (inside of Node.js) expects to be there. This is effectively a duplicate of a couple of bugs against the v8ppc code https://github.com/andrewlow/v8ppc/issues?state=open
I'll leave it open as a catcher for people trying to get Node on the Synology devices.
You do have the option of building a 'simulated' version (I suggest you read through some of the open issues for help on that). It'll run about 6x slower than native, but it will work.
Yes - eventually we will address this. It's a matter of priorities.
I ran into this on my G5 trying to build nodejs as a build dependency for firefox. The illegal instruction I end up halting on is friz. Just out of curiosity, is there any intrinsic value in using the POWER5+ only instruction for rounding doubles to integers as opposed to using the PPC/PPC64 equivalent? I think the older instructions work with POWER as well (they were in POWER4, I think), so it seems that the only reason to use this instruction is to make it incompatible with PowerPC machines. Obviously, I don't think that was the intent, but it seems odd to use this instruction instead.
Edit: ahh, from the looks of it, friz takes in an optional bit whether or not to set the state in condition register 1. From what I've read, this condition register has effects on the overall superscalar pipeline. So perhaps the instructions per clock count ends up taking a dip. Still, given that there's hacks in v8 to prevent frim from happening, I think it's just an oversight that friz is being called.
Hi @KungFuJesus, yeah, using friz
and other mode explicit instructions allows us to skip setting the rounding mode in the floating point status and control register, which would cause slowdown of the pipeline.
As for the frim
hacks you mentioned, I suspect you are referring to the FPU
CPU check in V8. Looking at the latest V8 master, despite the check, we have not implemented the fall-back path when this feature is not available. So we currently pre-req POWER5+, and will unconditionally generate frim
and friz
.
Andrew's suggestion about using simulator would still work, but it's not a performant option. Given that things haven't really changed in this regards since 2014, not sure when we'd be able to prioritize this work... That being said, this is an open-source project after all...contributions are welcomed! ;-)
Now that I need this to even attempt to build firefox (supposedly skia support was fixed for big endian architectures at some point, I'll have to see about this), I'm probably going to make an attempt at least at a patch. A very significant difference I also noticed is that the PPC variations of those instructions are floating point to fixed point instructions, rather than floating point to floating point. I think converting back to fp shouldn't be too bad, but it will definitely need to happen within the compiler's code generation (compiler/ppc/code-generator-ppc.cc) rather than emitting different instructions conditionally at the assembler bits. Effectively, as far as I can tell, frim and friz are being used strictly in rounding instructions rather than to integer conversions. I think this is the only entry point I need to stub in code for (correct me if I'm wrong).
It would be really handy if some of this code generation could fall back to LLVM to generate these bits of the JIT, but I suppose that's somewhat suboptimal as well, only allowing for function level abstractions rather than generic blocks.
I know, optimal dynarec'ing JITs are hard, but it does irritate me the level of effort it takes to bootstrap something like firefox when 6 years ago there were readily available PPC Linux builds. I appreciate the effort porting v8/nodejs to POWER, that should have been most of the heavy lifting.
This is my naive attempt at reimplementing each of the four instructions using the PowerPC instructions. I have admittedly never touched an FPU in assembly language (on any CPU) before today.
In my own testing, comparing to the musl C library's implementations of floor/round/ceil/trunc, they give me the same results on a POWER9 as the single instructions do, with the exception of values in [-0.0..-0.9]. In these cases, this version loses the signedness of the -0.0 result.
I don't know how to wire this up to v8's JIT. Since the flag is there, I know it should look something like:
case kPPC_TruncateDouble:
if (CpuFeatures::IsSupported(FPU)) {
ASSEMBLE_FLOAT_UNOP_RC(friz, MiscField::decode(instr->opcode()));
} else {
/* stuff goes here */
}
but that's as far as I have figured out as of yet. Since pasting this code into a comment on GitHub makes it ambiguous, I release this code under both the BSD-3-Clause and MIT license.
_set_round_mode:
; in = r3 = the bits to set
; out = none
; side-effects = FPCSR [RN] = r3
stdu 1,-128(1)
stw 3,116(1)
li 3,0
stw 3,112(1)
lfd 6,112(1)
mtfsf 0x01,6
ld 1,0(1)
blr
.global frim
.type frim,@function
frim:
stdu 1,-128(1)
mflr 0
std 0,144(1)
; CR7: Saved [RN]
mcrfs 7,7
mfocrf 3,0x01
ori 3,3,0b11
bl _set_round_mode
nop
fctid 1,1
fcfid 1,1
; put it back
mfocrf 3,0x01
bl _set_round_mode
nop
ld 0,144(1)
mtlr 0
ld 1,0(1)
blr
.global frip
.type frip,@function
frip:
stdu 1,-128(1)
mflr 0
std 0,144(1)
mcrfs 7,7
mfocrf 3,0x01
andi. 3,3,0b1100
ori 3,3,0b10
bl _set_round_mode
nop
fctid 1,1
fcfid 1,1
mfocrf 3,0x01
bl _set_round_mode
nop
ld 0,144(1)
mtlr 0
ld 1,0(1)
blr
.global friz
.type friz,@function
friz:
stdu 1,-128(1)
mflr 0
std 0,144(1)
mcrfs 7,7
mfocrf 3,0x01
andi. 3,3,0b1100
ori 3,3,1
bl _set_round_mode
nop
fctid 1,1
fcfid 1,1
mfocrf 3,0x01
bl _set_round_mode
nop
ld 0,144(1)
mtlr 0
ld 1,0(1)
blr
.global frin
.type frin,@function
frin:
stdu 1,-128(1)
mflr 0
std 0,144(1)
mcrfs 7,7
mfocrf 3,0x01
andi. 3,3,0b1100
bl _set_round_mode
nop
fctid 1,1
fcfid 1,1
mfocrf 3,0x01
bl _set_round_mode
nop
ld 0,144(1)
mtlr 0
ld 1,0(1)
blr
Having NAS based on POWER QUICC III. Also getting "Illegal instruction" after npm run. Should one hold out any hope for future implementation processor specific instruction?
We have not had the opportunity to add the fallback paths as @awilfox had developed... we'd need to address the [-0.0..-0.9] results also.
My bigger concern is that the latest Node.js and V8 itself support 64-bit only on POWER. 32-bit paths are neither tested nor maintained moving forwards -- even if we integrate the fallback paths for the FPU instructions as above into V8 master, there are no guarantees the rest of V8 will continue to work on 32-bit micro controllers.
Thanks for your reply.
Hi @KungFuJesus @joransiu ,
I found this bug while searching for "friz" because I just ran into this very issue trying to bootstrap Node.js in Gentoo Linux on a PowerMac G5 (ppc64). I'll work on my own fallback patch based on the comments in this thread.
I also wanted to note that ppc64 big-endian is ELFv1, which uses function descriptors, so I had to make a few small local patches (which I'll make a pull request for after some testing) to check for #if ABI_USES_FUNCTION_DESCRIPTORS
instead of #if defined(_AIX)
. Fortunately, the code that sets the value for ABI_USES_FUNCTION_DESCRIPTORS
correctly sets it to 1 for AIX or 64-bit big-endian, but not ppc64le or 32-bit PPC (if I have that right).
Also, two places in the Node.js 14.15.0 code assume the minimum physical page size is 64KB instead of 4KB for both ppc and ppc64, which caused a debug assertion failure in the first case I noticed (the other I found in code inspection). There's an open bug for at least the nouveau X.org driver (which I'm using on my G5) that it won't work if you change the physical page size from 4KB to 64KB.
I also wanted to note that ppc64 big-endian is ELFv1, which uses function descriptors,
It's worth noting that this is not always true - only legacy distributions continue to use ELFv1 on BE. Modern distributions like void-ppc and all musl-based distributions like Adélie always use ELFv2, regardless of endianness. I believe there are plans for Gentoo to publish an ELFv2 stage3 tarball for BE as well too.
I also wanted to note that ppc64 big-endian is ELFv1, which uses function descriptors,
It's worth noting that this is not always true - only legacy distributions continue to use ELFv1 on BE. Modern distributions like void-ppc and all musl-based distributions like Adélie always use ELFv2, regardless of endianness. I believe there are plans for Gentoo to publish an ELFv2 stage3 tarball for BE as well too.
That's very interesting to know, thanks! It's very awkward to transition, though, if you can't have both types on the same system disk at the same time. I know there aren't a lot of precompiled binaries for Linux/ppc, but if there's no good multilib solution to have ELFv1 and ELFv2 libraries at the same time, it seems problematic.
I'll definitely keep an eye on void-ppc.
Hi @KungFuJesus @joransiu ,
I found this bug while searching for "friz" because I just ran into this very issue trying to bootstrap Node.js in Gentoo Linux on a PowerMac G5 (ppc64). I'll work on my own fallback patch based on the comments in this thread.
I'm just about to test a patch I've been working on to hopefully tell V8 not to generate the FP rounding instructions if the CPU doesn't support them. First I worked on extending a patch I got from @awilfox to generate a fallback code sequence, but it ended up having branches and looked ugly and I'm not sure it's correct. Then today I realized that the x64 backend sets the flags for kFloat32RoundDown
, kFloat64RoundDown
, kFloat32RoundUp
, etc.. only if CpuFeatures::IsSupported(SSE4_1)
is true, so I think I should be able to do the same thing and only set the equivalent flags for PowerPC if the chip supports the feature.
Presumably there's a generic backend for the rounding operations that will kick in when I tell V8 that my G5 doesn't have those operations. I'll submit a pull request after I've tested all of this.
Awesome - compiling this JIT even in emulation mode has confounded my ability to build firefox (not sure why node.js for firefox became a thing) for a while.
Hello, as a reference I just wanted to point out a similar PR we had on nodjes regarding ppc64 ELFv1 and function descriptors: https://github.com/nodejs/node/pull/33866 Above PR was added to V8 using this CL: https://chromium-review.googlesource.com/c/v8/v8/+/2248202
Ok, I have an initial PR that solves the codegen problem by setting the CPU flags to not generate the instructions when it doesn't have them. I'm still testing it, but I'm happy to share and get feedback, especially to make sure it doesn't break the newer systems. I noticed the CPU detection logic wasn't setting the feature flags for POWER9, and it looked to me like it should be setting everything that POWER8 has. https://github.com/nodejs/node/pull/35988
Wonder what value kMaxRegularHeapObjectSize should take if patch narrowing page size? Or it has no effect.
adam@g5box ~ $ node
Welcome to Node.js v14.5.0.
Type ".help" for more information.
> .helpIllegal instruction
adam@g5box ~ $ node
Welcome to Node.js v15.3.0.
Type ".help" for more information.
> .help
.break Sometimes you get stuck, this gets you out
.clear Alias for .break
.editor Enter editor mode
.exit Exit the REPL
.help Print this help message
.load Load JS from a file into the REPL session
.save Save all evaluated commands in this REPL session to a file
Press Ctrl+C to abort current expression, Ctrl+D to exit the REPL
>
Promising so far (before vs after patch). I'll do a bit more testing - is there anything simple I should try in the interpreter?
I've compiled node 0.11.12-release-ppc on Ubuntu 12.04 on Vagrant on OS X. The target device is a Synology DS413 (NAS). I used the proper cross-compilation toolchain by Synology. Compilation succeeded only without a snapshot.
The illegal instruction (or SIGILL) occurs when no arguments are given, but never if given -v or -h. I have not gone through all possible CLI argument combinations.
With gdb on the NAS I've tried to locate the point where the SIGILL occurs, but have failed to reach the goal. If I ask for the backtrace right after the SIGILL, this is how it looks:
Program received signal SIGILL, Illegal instruction. 0x33c2f5e0 in ?? () (gdb) backtrace
0 0x33c2f5e0 in ?? ()
1 0x33c2f1f4 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Running gdb I found that there are lots of things done in the 'v8' namespace C++ code which makes me think the v8 that comes with this node.js fork has not been compiled for this CPU - no surprise - and I haven't succeeded in compiling v8ppc for it yet.
To fix this, should I find out if the problem really is in v8 (how?), and therefore compile v8ppc for this CPU, or do something to node?