Open Yashinde145 opened 2 months ago
sdk environment using poky sources.
I don't know what this means. How can we reproduce what you are doing? Can you maybe create a repository that reproduces your issue?
Sure, I will create a repo and share the link here in sometime.
looks like incr comp bug? :/
Here is the link for the repo, https://github.com/Yashinde145/Cargo_seg_fault_1_79 You can fork it and follow the following steps,
Steps to reproduce the seg fault:
$ cd Cargo_seg_fault_1_79
$ source oe-init-build-env
(A new build
dir will be generated and pwd is /home/Cargo_seg_fault_1_79/build
dir now)build/conf/local.conf
file
TOOLCHAIN_TARGET_TASK = "cargo rust"
TOOLCHAIN_HOST_TASK:append = " packagegroup-rust-cross-canadian-${MACHINE}"
TOOLCHAIN_TARGET_TASK:append = " libstd-rs"
IMAGE_CLASSES += "testimage testsdk"
TESTIMAGE_AUTO:qemuall = "1"
SANITY_TESTED_DISTROS=""
4. `$ bitbake core-image-sato -c do_testsdk` (Build may take around 30-40 mins)
The following error logs will be seen:
NOTE: test_cargo_build (rust.RustCompileTest) NOTE: ... ERROR Traceback (most recent call last): File "/home/poky/meta/lib/oeqa/sdk/cases/rust.py", line 34, in test_cargo_build self._run('cd %s/hello; cargo build' % self.tc.sdk_dir) File "/home/poky/meta/lib/oeqa/sdk/case.py", line 15, in _run return subprocess.check_output(". %s > /dev/null; %s;" % \ File "/usr/lib/python3.10/subprocess.py", line 421, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/usr/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, oeqa.utils.subprocesstweak.OETestCalledProcessError // Above is the python scripts backtrace running the sdk test scripts
Command: . /home/poky/build/tmp/work/qemux86_64-poky-linux/core-image-sato/1.0/testimage-sdk/environment-setup-core2-64-poky-linux > /dev/null; // this cmd invokes the sdk env
cd /home/poky/build/tmp/work/qemux86_64-poky-linux/core-image-sato/1.0/testimage-sdk//hello; // hello world rust src
cargo build --target x86_64-pokysdk-linux-gnu;' returned non-zero exit status 101 // cargo build cmd
After this the above mention backtrace is seen.
All of the changes in the yocto patches seem like they are designed to make it more likely that an incremental compilation failure occurs by
This does not mean "yocto did it". But it is harder to see what did.
@Yashinde145 Your problem would be significantly easier to diagnose if this build system at least guaranteed sufficient symbol tables and/or frame-pointers in the build dependencies so that we get back a full backtrace to work with. All I can guess from here is that you should set RUSTFLAGS=-Cincremental=false
forever.
https://github.com/Yashinde145/Cargo_seg_fault_1_79/blob/c6956f2339e5b30d8ae00a23377e5ed7117f47e9/meta/recipes-devtools/rust/files/0001-cargo-do-not-write-host-information-into-compilation.patch from a cursory look at your patches, this one seems really suspicious. it doesn't just remove the host, it also removes the rustc version. so maybe you're getting overlapping build directories for different rustc versions somewhere, which is guaranteed to cause crashes? not sure if that's what's happening, but it might be. doesn't even have to be caused by incremental, could just be a dependency rlib.
It would be useful to know if it still ceashes without these patches.
Does this actually need to be core-image-sato
to reproduce?
Why isn't core-image-minimal
sufficient?
It would be nice if this reproducer was turnkey. As in one command. Not "source this, write this conf file, then run this build command, and still have no idea how to inject environment variables into a build system that is deliberately trying to be impervious to external settings, without reading a treatise about 'layers' and how much I would absolutely love them if I was building a custom distribution for SBCs."
All of the changes in the yocto patches seem like they are designed to make it more likely that an incremental compilation failure occurs by
- removing build hashes
- removing layout verification
- removing target checks
This does not mean "yocto did it". But it is harder to see what did.
The commit https://github.com/Yashinde145/Cargo_seg_fault_1_79/commit/c6956f2339e5b30d8ae00a23377e5ed7117f47e9 does increment rust version updates from v1.75 to v1.79 which includes:
removing build hashes and layout verification- These were removed because rust builds were not reproducible (reproducibility test is to check if same host and build configs yields identical binaries generated in different build dirs).
removing target checks- Some of the rust tests were unsupported/failed in yocto oe-selftest (yocto's test framework to for testing toolchains and packages). Thus, they were skipped/excluded.
@Yashinde145 Your problem would be significantly easier to diagnose if this build system at least guaranteed sufficient symbol tables and/or frame-pointers in the build dependencies so that we get back a full backtrace to work with. All I can guess from here is that you should set
RUSTFLAGS=-Cincremental=false
forever.
Agreed. Even RUST_BACKTRACE=full
didn't give detailed backtrace here. I will check if there's any way to get symbol table/ frame-pointers in the build system.
I guess RUSTFLAGS=-Cincremental
is used here to speed up the build process and compilation time. I will check again by disabling it.
Does this actually need to be
core-image-sato
to reproduce? Why isn'tcore-image-minimal
sufficient?
The issue is seen when running do_testsdk
task which is provided by core-image-sato
.
It would be nice if this reproducer was turnkey. As in one command. Not "source this, write this conf file, then run this build command, and still have no idea how to inject environment variables into a build system that is deliberately trying to be impervious to external settings, without reading a treatise about 'layers' and how much I would absolutely love them if I was building a custom distribution for SBCs."
I will try to share the reproducer with a bash script in a while.
@Yashinde145 making binaries more "reproducible" by making them look more similar to each other, when those differences are often what rustc and cargo use to prevent compiling and linking code that is not ABI-compatible, is dangerous.
And this patch unconditionally disables a check instead of unconditionally enabling it. Which means rustc no longer checks if it can produce correct code, ever: https://github.com/yoctoproject/poky/commit/c08c522fc29445aef0c64f0dd8df8a3531c04afa
Do you see the problem with yocto repeatedly simply disabling safety checks in the name of "reproducibility", and then filing bugs against rustc? Why is yocto waiting until they've busted the compiler's output instead of asking how the compiler can be patched to make something more reproducible?
https://github.com/Yashinde145/Cargo_seg_fault_1_79/blob/c6956f2339e5b30d8ae00a23377e5ed7117f47e9/meta/recipes-devtools/rust/files/0001-cargo-do-not-write-host-information-into-compilation.patch from a cursory look at your patches, this one seems really suspicious. it doesn't just remove the host, it also removes the rustc version. so maybe you're getting overlapping build directories for different rustc versions somewhere, which is guaranteed to cause crashes? not sure if that's what's happening, but it might be. doesn't even have to be caused by incremental, could just be a dependency rlib.
It would be useful to know if it still ceashes without these patches.
Yes, I will check by reverting the 0001-cargo-do-not-write-host-information-into-compilation.patch
and applying the actual fix https://github.com/rust-lang/cargo/pull/14107 for the problem.
@Yashinde145 making binaries more "reproducible" by making them look more similar to each other, when those differences are often what rustc and cargo use to prevent compiling and linking code that is not ABI-compatible, is dangerous.
And this patch unconditionally disables a check instead of unconditionally enabling it. Which means rustc no longer checks if it can produce correct code, ever: yoctoproject/poky@c08c522
Do you see the problem with yocto repeatedly simply disabling safety checks in the name of "reproducibility", and then filing bugs against rustc? Why is yocto waiting until they've busted the compiler's output instead of asking how the compiler can be patched to make something more reproducible?
I am not sure what's the actual cause here. Let me cross-check with everyone's input here and then the picture might be clearer.
Reproducer script: repro.txt
#!/bin/bash
# Exit immediately if a command exits with a non-zero status
set -e
# Clone the repository
git clone https://github.com/Yashinde145/Cargo_seg_fault_1_79
cd Cargo_seg_fault_1_79
# Source the OE build environment
source oe-init-build-env
# Define the local.conf file path
LOCAL_CONF_FILE="conf/local.conf"
# Append necessary configurations to local.conf
echo 'TOOLCHAIN_TARGET_TASK = "cargo rust"' >> $LOCAL_CONF_FILE
echo 'TOOLCHAIN_HOST_TASK:append = " packagegroup-rust-cross-canadian-${MACHINE}"' >> $LOCAL_CONF_FILE
echo 'TOOLCHAIN_TARGET_TASK:append = " libstd-rs"' >> $LOCAL_CONF_FILE
echo 'IMAGE_CLASSES += "testimage testsdk"' >> $LOCAL_CONF_FILE
echo 'TESTIMAGE_AUTO:qemuall = "1"' >> $LOCAL_CONF_FILE
echo 'SANITY_TESTED_DISTROS=""' >> $LOCAL_CONF_FILE
# Run the bitbake command
bitbake core-image-sato -c do_testsdk
Does this reproduce with the latest rustc nightly and all obsolete patches dropped? Half the patches seem to be obsolete and for some of them the original patch is not entirely correct I think.
I updated the script to be repeatable from the root of the repository:
#!/bin/bash
# Exit immediately if a command exits with a non-zero status
set -e
# Source the OE build environment
source oe-init-build-env
# Define the local.conf file path
LOCAL_CONF_FILE="conf/local.conf"
local_conf_setup="$(grep "$LOCAL_CONF_FILE" -e 'TOOLCHAIN')"
# Append necessary configurations to local.conf
if [ "$local_conf_setup" ]; then
echo "Already set up conf file..."
else
echo 'TOOLCHAIN_TARGET_TASK = "cargo rust"' >> $LOCAL_CONF_FILE
echo 'TOOLCHAIN_HOST_TASK:append = " packagegroup-rust-cross-canadian-${MACHINE}"' >> $LOCAL_CONF_FILE
echo 'TOOLCHAIN_TARGET_TASK:append = " libstd-rs"' >> $LOCAL_CONF_FILE
echo 'IMAGE_CLASSES += "testimage testsdk"' >> $LOCAL_CONF_FILE
echo 'TESTIMAGE_AUTO:qemuall = "1"' >> $LOCAL_CONF_FILE
echo 'SANITY_TESTED_DISTROS=""' >> $LOCAL_CONF_FILE
fi
# Run the bitbake command
bitbake core-image-sato -c do_testsdk
I then removed a number of patches, including the "hardcodepaths.patch
" file (in reality, it now only disables checking Rust's data layouts against LLVM's data layouts), and got this during the build:
| error: data-layout for target
x86_64-poky-linux-gnu
,e-m:e-i64:64-f80:128-n8:16:32:64-S128
, differs from LLVM target'sx86_64-poky-linux-gnu
default layout,e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128
@Yashinde145 Please remove the hardcodepaths.patch
file and commit the necessary fixes to correctly build the poky targets. The version as-of your update commit does not remove any paths embedded into the binary anymore, it simply disables a safety check for codegen. Rust has quite enough codegen-correctness problems, against what are often unclearly-documented-at-best ABIs, without your patches introducing more.
I do not think we can help you further with your problems until that is done.
Update for:
Yes, I will check by reverting the 0001-cargo-do-not-write-host-information-into-compilation.patch and applying the actual fix https://github.com/rust-lang/cargo/pull/14107 for the problem.
and
I guess RUSTFLAGS=-Cincremental is used here to speed up the build process and compilation time. I will check again by disabling it.
I did both the changes and still get the same error.
@workingjubilee , Thanks for the script updates.
@Yashinde145 Please remove the hardcodepaths.patch file and commit the necessary fixes to correctly build the poky targets.
I will make the commit changes as suggested by you but it may take some time to fix the data-layout difference error.
I will update here once I have the changes ready.
cargo build
for a simple hello world program gives seg fault when built in sdk environment using poky sources. This was first observed in rustc v1.78 and continued in v1.79 and v1.80 also. (Note: There's no change in the process of sdk build env when tested between the versions).rustc --version --verbose
:Error output
The backtrace generated is same for with "RUST_BACKTRACE=1" and "RUST_BACKTRACE=full". I suspect the following out of bounds index access is the main reason for seg fault here. at compiler/rustc_metadata/src/creader.rs:193:31: `index out of bounds: the len is 20 but the index is 60747757` https://github.com/rust-lang/rust/commit/fbc9b94064c611f341d613ce14d321b267b3c298 and https://github.com/rust-lang/rust/commit/0025c9cc5001c194ac1f8ce4824d93a0efbe5c18 are the recent commits related to this. Maybe @oli-obk can help to understand the error better?
Backtrace
``` {"$message_type":"artifact","artifact":"/home/poky/build/tmp/work/qemux86_64-poky-linux/core-image-sato/1.0/testimage-sdk/hello/target/debug/build/hello-69a92b98b70371ba/build_script_build-69a92b98b70371ba.d","emit":"dep-info"} thread 'rustc' panicked at compiler/rustc_metadata/src/creader.rs:193:31: index out of bounds: the len is 20 but the index is 60747757 stack backtrace: 0: 0x7fa9a2bcf31f -::fmt::hdd8826f6b9d3bb6e
1: 0x7fa9a2c0246b - core::fmt::write::hce77028645369722
2: 0x7fa9a2bca2ce -
3: 0x7fa9a2bcf0ee -
4: 0x7fa9a2bb889a -
5: 0x7fa9a2bb8594 - std::panicking::default_hook::h49af0c7febe67f8d
6: 0x7fa9a37e7187 -
7: 0x7fa9a2bb90e9 - std::panicking::rust_panic_with_hook::h14a0ca211eb21fbf
8: 0x7fa9a2bcf6e2 -
9: 0x7fa9a2bcf529 -
10: 0x7fa9a2bb8cc6 - rust_begin_unwind
11: 0x7fa9a2b71422 - core::panicking::panic_fmt::hb4b7de66d883fcc4
12: 0x7fa9a2b715f6 - core::panicking::panic_bounds_check::h4eecb12f9bb341c4
13: 0x7fa9a83c8ffd - ::stable_crate_id
14: 0x7fa9a89931e9 -
15: 0x7fa9a89978ed - ::serialize
16: 0x7fa9a8891541 - ::serialize_query_result_cache
17: 0x7fa9a826b2ea -
18: 0x7fa9a82640cc -
19: 0x7fa9a826b5c0 -
20: 0x7fa9a827c0f1 -
21: 0x7fa9a822acd9 - rustc_incremental[7e9745f68514030c]::persist::save::save_dep_graph
22: 0x7fa9a37f6045 -
23: 0x7fa9a37a30bd -
24: 0x7fa9a37c7ba7 -
25: 0x7fa9a37c968d -
26: 0x7fa9a2bbc73b -
27: 0x7fa9a29c9b62 -
28: 0x7fa9a2a4463c -
29: 0x0 -
error: the compiler unexpectedly panicked. this is a bug.
note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md
note: rustc 1.79.0 (129f3b996 2024-06-10) (built from a source tarball) running on x86_64-pokysdk-linux-gnu
note: compiler flags: --crate-type bin -C embed-bitcode=no -C debuginfo=2 -C incremental=[REDACTED]
note: some of the compiler flags provided by cargo are hidden
query stack during panic:
end of query stack
error: rustc interrupted by SIGSEGV, printing backtrace
```