Open he32 opened 7 months ago
That aarch64 stack overflow I mentioned above is now reported in https://github.com/rust-lang/rust/issues/123551.
So ... this looks at least on the surface of it like a "boring" NULL pointer de-reference.
Given that this is in Rust, it is not at all boring... it seems very unlikely to be a bug in the crashing part of the compiler, this is probably a miscompilation.
So ... this looks at least on the surface of it like a "boring" NULL pointer de-reference.
Given that this is in Rust, it is not at all boring... it seems very unlikely to be a bug in the crashing part of the compiler, this is probably a miscompilation.
Maybe. But ... if this is a code generation bug, is anything similar observed on other armv7 platforms?
I on my side will re-try the build with an exteral LLVM, version 16.0.6, instead of the embedded 17.0.6 version, though that will require an LLVM rebuild, so will take a while to complete. Will return with status when that's done.
people generally don't compile rustc on armv7 platforms (it's a pretty weird thing to do imo, especially as it does take so long, cross compilation is a thing) so I wouldn't be too surprised if no one has hit this
Hm, yes, I'm not surprised. However, self-building rust turns out to be an effective "stress test" of the compiler.
Wrt. the included LLVM: rust 1.76.0 contains the same version of LLVM (17.0.6) as rust 1.77.1, so ... this makes it slightly dubious whether replacing the embedded LLVM with an external one will fix this problem. :(
If you could bisect it down that would be great... and take quite a while though.. so not exactly ideal either. :upside_down_face:
Bisecting will be troublesome at best, since my build setup is based on rust tarballs and not github checkouts. Though I'm receptive to hints on how to do that. No promises, though...
Just checking: the title of this issue, and OP, both mention "1.71.1", but I presume they are typos for "1.77.1"?
Just checking: the title of this issue, and OP, both mention "1.71.1", but I presume they are typos for "1.77.1"?
Yes. How sloppy of me. I see that's fixed.
Bisecting will be troublesome at best, since my build setup is based on rust tarballs and not github checkouts. Though I'm receptive to hints on how to do that. No promises, though...
What about using git archive
to generate a tarball?
Bisecting will be troublesome at best, since my build setup is based on rust tarballs and not github checkouts. Though I'm receptive to hints on how to do that. No promises, though...
What about using
git archive
to generate a tarball?
I don't think that will do the same -- won't that leave out all the vendored crates?
WG-prioritization assigning priority (Zulip discussion).
@rustbot label -I-prioritize +P-low
FWIW, I did a re-try where I configured rust to use an external LLVM (version 16.0.6), and perhaps unsurprisingly, this didn't fix the issue -- the stage-2 rust compiler still gets a SEGV compiling the same code (though I didn't do GDB this time). So this is an indication that this is probably a bug in the part of the compiler written in rust. "How fun."
people generally don't compile rustc on armv7 platforms (it's a pretty weird thing to do imo, especially as it does take so long, cross compilation is a thing) so I wouldn't be too surprised if no one has hit this
Yes, I do realize that cross compilation is a thing. However, one must have some assurance that the cross-compilation produces correct results. So in the mean time I have tried using the cross-built rust compiler (built on NetBSD/amd64, targeting NetBSD/armv7) to natively build the dua-cli
rust application, and that quickly falls on its face with cargo
getting a SEGV crash and core dump fairly quickly, and a gdb session on the core file just produces:
armv7: {147} gdb /usr/pkg/bin/cargo ./work/dua-cli-2.20.1/cargo.core
GNU gdb (GDB) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "armv7--netbsdelf-eabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/pkg/bin/cargo...
(No debugging symbols found in /usr/pkg/bin/cargo)
[New process 1]
Core was generated by `cargo'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0cc5c960 in std::panicking::try ()
(gdb) where
#0 0x0cc5c960 in std::panicking::try ()
#1 0x0cc3eae4 in std::rt::lang_start_internal ()
#2 0x0bccd020 in main ()
(gdb) i reg
r0 0x44a3a4 4498340
r1 0x1 1
r2 0x1f8 504
r3 0x5c092000 1544101888
r4 0x7ff46fd8 2146725848
r5 0x7ff46ffc 2146725884
r6 0x5c092200 1544102400
r7 0xd0b3e24 218840612
r8 0x2401a8fe 604088574
r9 0xf798910e 4153970958
r10 0x0 0
r11 0x7ff47024 2146725924
r12 0x2 2
sp 0x7ff46fa8 0x7ff46fa8
lr 0xcc5c958 214288728
pc 0xcc5c960 0xcc5c960 <std::panicking::try+108>
cpsr 0x200b0010 537591824
(gdb) x/i $pc
=> 0xcc5c960 <_ZN3std9panicking3try17h66c30643ffe8f2f0E+108>:
ldr r10, [r9]
(gdb) x/x 0xf798910e
0xf798910e: Cannot access memory at address 0xf798910e
(gdb) q
So... This has me asking what the state is for testing the results of cross compilation from other systems targeting armv7? The above test should at least be fairly easy to carry out on other armv7 systems.
I have re-tried this with rust 1.78.0 which brings along a new major version of the embedded LLVM, and the status is basically the same as reported above. I'm also building rust for among others amd64, riscv64, sparc64, 32-bit powerpc, all on NetBSD, and none of these behave this way. I'm therefore left wondering if there is something CPU-specific for armv7 which should be done on NetBSD, which is new in 1.77.*, and which currently isn't there.
Parts of my summarized test results is as follows, all on NetBSD, and "success" is gauged by the platform self-hosting the rust compiler and being able to build librsvg
natively with the result, i.e. not just cross-building the compiler for use on the target. So "something" definitely happened between version 1.76.0 and 1.77.1 which made all of the arm targets fail (both 32-bit and 64-bit). Hints gratefully accepted...
1.75.0 1.76.0 1.77.1 1.78.0
aarch64 x x f f
amd64 x x x x
armv6 x x f f
armv7 x x f f
i386 x x x x
ppc x x x x
riscv64 x x x o
sparc64 x x x o
x = tests ok
f = failed one or more tests
o = ongoing / undecided
This indicates that what I'm seeing is not an "OS error", probably not an artifact of my build setup (since it successfully cross-compiles for other CPUs / targets, and earlier versions have produced working results for armv7), and is probably CPU-dependent, perhaps coupled with the particular OS I'm building for.
Hmm, I guess https://github.com/rust-lang/rust/pull/122002 doesn't fix it either.
It is likely the fix is to disable the has_thread_local
for the Arm NetBSD targets, see https://github.com/rust-lang/rust/issues/123551
While working through the testing of rust 1.77.1 for the various NetBSD targets we try to keep rust running on, I am having trouble getting rust to build "natively" on an emulated armv7 system.
I expected to see this happen: The build of rust should complete.
Instead, this happened: Rust 1.77.1 fails to build, while the build of 1.76.0 succeeded.
The error I see is a SIGSEGV error in the rust build, using the
stage2
rust compiler, which is built with the embedded LLVM in the rust distribution, either when building thelibc
crate, or sometimes (I tried restarting the build multiple times) when building theproc-macro2
orunicode-ident
vendor crates.I initially tried running the build with a parallelism of 3, but have dialed that down to 1 now to eliminate any issues with that and also to make the build log less confusing, and it now looks like it has consistent problems building the
libc-v0.2.151
crate. This happens quite a bit into the build, initially after 40+ hours, and as mentioned, while using thestage2
compiler, so this is rustc 1.77.1 itself. I initially had problems getting any information out of this from gdb, but this now succeeded, hence this problem report. The end of the build log was in this latest rebuild attempt:and the corresponding
gdb
session looks like it actually managed to pick the appropriate information out of the system:So ... this looks at least on the surface of it like a "boring" NULL pointer de-reference.
However, I am also having issues building rust 1.71.1 natively on arm64 (not yet reported), and on that platform a stack overflow is flagged, so I wonder if this could be the same problem. The stack in the above backtrace is kind of long...
Meta
rustc --version --verbose
:Backtrace
See above.