rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.05k stars 12.69k forks source link

`poky`-patched cargo segfaults on `cargo build --bin helloworld` #128492

Open Yashinde145 opened 2 months ago

Yashinde145 commented 2 months ago

cargo build for a simple hello world program gives seg fault when built in sdk environment using poky sources. This was first observed in rustc v1.78 and continued in v1.79 and v1.80 also. (Note: There's no change in the process of sdk build env when tested between the versions).

rustc --version --verbose:

rustc 1.79.0 (129f3b996 2024-06-10) (built from a source tarball)
binary: rustc
commit-hash: 129f3b9964af4d4a709d1383930ade12dfe7c081
commit-date: 2024-06-10
host: x86_64-pokysdk-linux-gnu
release: 1.79.0
LLVM version: 18.1.7

Error output

error: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md

note: rustc 1.79.0 (129f3b996 2024-06-10) (built from a source tarball) running on x86_64-pokysdk-linux-gnu

note: compiler flags: --crate-type bin -C embed-bitcode=no -C debuginfo=2 -C incremental=[REDACTED]

note: some of the compiler flags provided by cargo are hidden

query stack during panic:
end of query stack
error: rustc interrupted by SIGSEGV, printing backtrace

The backtrace generated is same for with "RUST_BACKTRACE=1" and "RUST_BACKTRACE=full". I suspect the following out of bounds index access is the main reason for seg fault here. at compiler/rustc_metadata/src/creader.rs:193:31: `index out of bounds: the len is 20 but the index is 60747757` https://github.com/rust-lang/rust/commit/fbc9b94064c611f341d613ce14d321b267b3c298 and https://github.com/rust-lang/rust/commit/0025c9cc5001c194ac1f8ce4824d93a0efbe5c18 are the recent commits related to this. Maybe @oli-obk can help to understand the error better?

Backtrace

``` {"$message_type":"artifact","artifact":"/home/poky/build/tmp/work/qemux86_64-poky-linux/core-image-sato/1.0/testimage-sdk/hello/target/debug/build/hello-69a92b98b70371ba/build_script_build-69a92b98b70371ba.d","emit":"dep-info"} thread 'rustc' panicked at compiler/rustc_metadata/src/creader.rs:193:31: index out of bounds: the len is 20 but the index is 60747757 stack backtrace: 0: 0x7fa9a2bcf31f - ::fmt::hdd8826f6b9d3bb6e 1: 0x7fa9a2c0246b - core::fmt::write::hce77028645369722 2: 0x7fa9a2bca2ce - 3: 0x7fa9a2bcf0ee - 4: 0x7fa9a2bb889a - 5: 0x7fa9a2bb8594 - std::panicking::default_hook::h49af0c7febe67f8d 6: 0x7fa9a37e7187 - 7: 0x7fa9a2bb90e9 - std::panicking::rust_panic_with_hook::h14a0ca211eb21fbf 8: 0x7fa9a2bcf6e2 - 9: 0x7fa9a2bcf529 - 10: 0x7fa9a2bb8cc6 - rust_begin_unwind 11: 0x7fa9a2b71422 - core::panicking::panic_fmt::hb4b7de66d883fcc4 12: 0x7fa9a2b715f6 - core::panicking::panic_bounds_check::h4eecb12f9bb341c4 13: 0x7fa9a83c8ffd - ::stable_crate_id 14: 0x7fa9a89931e9 - 15: 0x7fa9a89978ed - ::serialize 16: 0x7fa9a8891541 - ::serialize_query_result_cache 17: 0x7fa9a826b2ea - 18: 0x7fa9a82640cc - 19: 0x7fa9a826b5c0 - 20: 0x7fa9a827c0f1 - 21: 0x7fa9a822acd9 - rustc_incremental[7e9745f68514030c]::persist::save::save_dep_graph 22: 0x7fa9a37f6045 - 23: 0x7fa9a37a30bd - 24: 0x7fa9a37c7ba7 - 25: 0x7fa9a37c968d - 26: 0x7fa9a2bbc73b - 27: 0x7fa9a29c9b62 - 28: 0x7fa9a2a4463c - 29: 0x0 - error: the compiler unexpectedly panicked. this is a bug. note: we would appreciate a bug report: https://github.com/rust-lang/rust/issues/new?labels=C-bug%2C+I-ICE%2C+T-compiler&template=ice.md note: rustc 1.79.0 (129f3b996 2024-06-10) (built from a source tarball) running on x86_64-pokysdk-linux-gnu note: compiler flags: --crate-type bin -C embed-bitcode=no -C debuginfo=2 -C incremental=[REDACTED] note: some of the compiler flags provided by cargo are hidden query stack during panic: end of query stack error: rustc interrupted by SIGSEGV, printing backtrace ```

oli-obk commented 2 months ago

sdk environment using poky sources.

I don't know what this means. How can we reproduce what you are doing? Can you maybe create a repository that reproduces your issue?

Yashinde145 commented 2 months ago

Sure, I will create a repo and share the link here in sometime.

matthiaskrgr commented 2 months ago

looks like incr comp bug? :/

Yashinde145 commented 2 months ago

Here is the link for the repo, https://github.com/Yashinde145/Cargo_seg_fault_1_79 You can fork it and follow the following steps,

Steps to reproduce the seg fault:

  1. $ cd Cargo_seg_fault_1_79
  2. $ source oe-init-build-env (A new build dir will be generated and pwd is /home/Cargo_seg_fault_1_79/build dir now)
  3. Add the following lines(config changes) in build/conf/local.conf file
    
    TOOLCHAIN_TARGET_TASK = "cargo rust"
    TOOLCHAIN_HOST_TASK:append = " packagegroup-rust-cross-canadian-${MACHINE}" 
    TOOLCHAIN_TARGET_TASK:append = " libstd-rs"
    IMAGE_CLASSES += "testimage testsdk"  
    TESTIMAGE_AUTO:qemuall = "1" 

SANITY_TESTED_DISTROS=""

4. `$ bitbake core-image-sato -c do_testsdk`  (Build may take around 30-40 mins)

The following error logs will be seen:

NOTE: test_cargo_build (rust.RustCompileTest) NOTE: ... ERROR Traceback (most recent call last): File "/home/poky/meta/lib/oeqa/sdk/cases/rust.py", line 34, in test_cargo_build self._run('cd %s/hello; cargo build' % self.tc.sdk_dir) File "/home/poky/meta/lib/oeqa/sdk/case.py", line 15, in _run return subprocess.check_output(". %s > /dev/null; %s;" % \ File "/usr/lib/python3.10/subprocess.py", line 421, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/usr/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, oeqa.utils.subprocesstweak.OETestCalledProcessError // Above is the python scripts backtrace running the sdk test scripts

Command: . /home/poky/build/tmp/work/qemux86_64-poky-linux/core-image-sato/1.0/testimage-sdk/environment-setup-core2-64-poky-linux > /dev/null; // this cmd invokes the sdk env

cd /home/poky/build/tmp/work/qemux86_64-poky-linux/core-image-sato/1.0/testimage-sdk//hello; // hello world rust src

cargo build --target x86_64-pokysdk-linux-gnu;' returned non-zero exit status 101 // cargo build cmd



After this the above mention backtrace is seen.
workingjubilee commented 2 months ago

All of the changes in the yocto patches seem like they are designed to make it more likely that an incremental compilation failure occurs by

This does not mean "yocto did it". But it is harder to see what did.

@Yashinde145 Your problem would be significantly easier to diagnose if this build system at least guaranteed sufficient symbol tables and/or frame-pointers in the build dependencies so that we get back a full backtrace to work with. All I can guess from here is that you should set RUSTFLAGS=-Cincremental=false forever.

Noratrieb commented 2 months ago

https://github.com/Yashinde145/Cargo_seg_fault_1_79/blob/c6956f2339e5b30d8ae00a23377e5ed7117f47e9/meta/recipes-devtools/rust/files/0001-cargo-do-not-write-host-information-into-compilation.patch from a cursory look at your patches, this one seems really suspicious. it doesn't just remove the host, it also removes the rustc version. so maybe you're getting overlapping build directories for different rustc versions somewhere, which is guaranteed to cause crashes? not sure if that's what's happening, but it might be. doesn't even have to be caused by incremental, could just be a dependency rlib.

Noratrieb commented 2 months ago

It would be useful to know if it still ceashes without these patches.

workingjubilee commented 2 months ago

Does this actually need to be core-image-sato to reproduce?

Why isn't core-image-minimal sufficient?

It would be nice if this reproducer was turnkey. As in one command. Not "source this, write this conf file, then run this build command, and still have no idea how to inject environment variables into a build system that is deliberately trying to be impervious to external settings, without reading a treatise about 'layers' and how much I would absolutely love them if I was building a custom distribution for SBCs."

Yashinde145 commented 2 months ago

All of the changes in the yocto patches seem like they are designed to make it more likely that an incremental compilation failure occurs by

  • removing build hashes
  • removing layout verification
  • removing target checks

This does not mean "yocto did it". But it is harder to see what did.

The commit https://github.com/Yashinde145/Cargo_seg_fault_1_79/commit/c6956f2339e5b30d8ae00a23377e5ed7117f47e9 does increment rust version updates from v1.75 to v1.79 which includes:

Agreed. Even RUST_BACKTRACE=full didn't give detailed backtrace here. I will check if there's any way to get symbol table/ frame-pointers in the build system.

I guess RUSTFLAGS=-Cincremental is used here to speed up the build process and compilation time. I will check again by disabling it.

Yashinde145 commented 2 months ago

Does this actually need to be core-image-sato to reproduce? Why isn't core-image-minimal sufficient?

The issue is seen when running do_testsdk task which is provided by core-image-sato.

It would be nice if this reproducer was turnkey. As in one command. Not "source this, write this conf file, then run this build command, and still have no idea how to inject environment variables into a build system that is deliberately trying to be impervious to external settings, without reading a treatise about 'layers' and how much I would absolutely love them if I was building a custom distribution for SBCs."

I will try to share the reproducer with a bash script in a while.

workingjubilee commented 2 months ago

@Yashinde145 making binaries more "reproducible" by making them look more similar to each other, when those differences are often what rustc and cargo use to prevent compiling and linking code that is not ABI-compatible, is dangerous.

And this patch unconditionally disables a check instead of unconditionally enabling it. Which means rustc no longer checks if it can produce correct code, ever: https://github.com/yoctoproject/poky/commit/c08c522fc29445aef0c64f0dd8df8a3531c04afa

Do you see the problem with yocto repeatedly simply disabling safety checks in the name of "reproducibility", and then filing bugs against rustc? Why is yocto waiting until they've busted the compiler's output instead of asking how the compiler can be patched to make something more reproducible?

Yashinde145 commented 2 months ago

https://github.com/Yashinde145/Cargo_seg_fault_1_79/blob/c6956f2339e5b30d8ae00a23377e5ed7117f47e9/meta/recipes-devtools/rust/files/0001-cargo-do-not-write-host-information-into-compilation.patch from a cursory look at your patches, this one seems really suspicious. it doesn't just remove the host, it also removes the rustc version. so maybe you're getting overlapping build directories for different rustc versions somewhere, which is guaranteed to cause crashes? not sure if that's what's happening, but it might be. doesn't even have to be caused by incremental, could just be a dependency rlib.

It would be useful to know if it still ceashes without these patches.

Yes, I will check by reverting the 0001-cargo-do-not-write-host-information-into-compilation.patch and applying the actual fix https://github.com/rust-lang/cargo/pull/14107 for the problem.

Yashinde145 commented 2 months ago

@Yashinde145 making binaries more "reproducible" by making them look more similar to each other, when those differences are often what rustc and cargo use to prevent compiling and linking code that is not ABI-compatible, is dangerous.

And this patch unconditionally disables a check instead of unconditionally enabling it. Which means rustc no longer checks if it can produce correct code, ever: yoctoproject/poky@c08c522

Do you see the problem with yocto repeatedly simply disabling safety checks in the name of "reproducibility", and then filing bugs against rustc? Why is yocto waiting until they've busted the compiler's output instead of asking how the compiler can be patched to make something more reproducible?

I am not sure what's the actual cause here. Let me cross-check with everyone's input here and then the picture might be clearer.

Yashinde145 commented 2 months ago

Reproducer script: repro.txt

#!/bin/bash

# Exit immediately if a command exits with a non-zero status
set -e

# Clone the repository
git clone https://github.com/Yashinde145/Cargo_seg_fault_1_79
cd Cargo_seg_fault_1_79

# Source the OE build environment
source oe-init-build-env

# Define the local.conf file path
LOCAL_CONF_FILE="conf/local.conf"

# Append necessary configurations to local.conf
echo 'TOOLCHAIN_TARGET_TASK = "cargo rust"' >> $LOCAL_CONF_FILE
echo 'TOOLCHAIN_HOST_TASK:append = " packagegroup-rust-cross-canadian-${MACHINE}"' >> $LOCAL_CONF_FILE
echo 'TOOLCHAIN_TARGET_TASK:append = " libstd-rs"' >> $LOCAL_CONF_FILE
echo 'IMAGE_CLASSES += "testimage testsdk"' >> $LOCAL_CONF_FILE
echo 'TESTIMAGE_AUTO:qemuall = "1"' >> $LOCAL_CONF_FILE
echo 'SANITY_TESTED_DISTROS=""' >> $LOCAL_CONF_FILE

# Run the bitbake command
bitbake core-image-sato -c do_testsdk
bjorn3 commented 2 months ago

Does this reproduce with the latest rustc nightly and all obsolete patches dropped? Half the patches seem to be obsolete and for some of them the original patch is not entirely correct I think.

workingjubilee commented 2 months ago

I updated the script to be repeatable from the root of the repository:

#!/bin/bash

# Exit immediately if a command exits with a non-zero status
set -e

# Source the OE build environment
source oe-init-build-env

# Define the local.conf file path
LOCAL_CONF_FILE="conf/local.conf"

local_conf_setup="$(grep "$LOCAL_CONF_FILE" -e 'TOOLCHAIN')"
# Append necessary configurations to local.conf
if [ "$local_conf_setup" ]; then
  echo "Already set up conf file..."
else
  echo 'TOOLCHAIN_TARGET_TASK = "cargo rust"' >> $LOCAL_CONF_FILE
  echo 'TOOLCHAIN_HOST_TASK:append = " packagegroup-rust-cross-canadian-${MACHINE}"' >> $LOCAL_CONF_FILE
  echo 'TOOLCHAIN_TARGET_TASK:append = " libstd-rs"' >> $LOCAL_CONF_FILE
  echo 'IMAGE_CLASSES += "testimage testsdk"' >> $LOCAL_CONF_FILE
  echo 'TESTIMAGE_AUTO:qemuall = "1"' >> $LOCAL_CONF_FILE
  echo 'SANITY_TESTED_DISTROS=""' >> $LOCAL_CONF_FILE
fi

# Run the bitbake command
bitbake core-image-sato -c do_testsdk

I then removed a number of patches, including the "hardcodepaths.patch" file (in reality, it now only disables checking Rust's data layouts against LLVM's data layouts), and got this during the build:

| error: data-layout for target x86_64-poky-linux-gnu, e-m:e-i64:64-f80:128-n8:16:32:64-S128, differs from LLVM target's x86_64-poky-linux-gnu default layout, e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128

@Yashinde145 Please remove the hardcodepaths.patch file and commit the necessary fixes to correctly build the poky targets. The version as-of your update commit does not remove any paths embedded into the binary anymore, it simply disables a safety check for codegen. Rust has quite enough codegen-correctness problems, against what are often unclearly-documented-at-best ABIs, without your patches introducing more.

I do not think we can help you further with your problems until that is done.

Yashinde145 commented 2 months ago

Update for:

Yes, I will check by reverting the 0001-cargo-do-not-write-host-information-into-compilation.patch and applying the actual fix https://github.com/rust-lang/cargo/pull/14107 for the problem.

and

I guess RUSTFLAGS=-Cincremental is used here to speed up the build process and compilation time. I will check again by disabling it.

I did both the changes and still get the same error.

Yashinde145 commented 2 months ago

@workingjubilee , Thanks for the script updates.

@Yashinde145 Please remove the hardcodepaths.patch file and commit the necessary fixes to correctly build the poky targets.

I will make the commit changes as suggested by you but it may take some time to fix the data-layout difference error.

I will update here once I have the changes ready.