Open dj8yfo opened 3 weeks ago
ub_checks::assert_unsafe_precondition!
check inside of
https://github.com/rust-lang/rust/blob/master/library/std/src/sys/thread_local/native/lazy.rs#L66
fails,
because earlier LazyStorage::new
failed to put State::Initial
into an adress with correct alignment for State<T,D>
https://github.com/rust-lang/rust/blob/master/library/std/src/sys/thread_local/native/lazy.rs#L37,
where T
is RefCell<MockedBlockchain<Memory = MockedMemory>>
earlier we had a similar issue with aarch64-apple-darwin
with somewhat similar panics:
0x102f569ec - core::ptr::replace::he9e2c42f9fe96d9b
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ub_checks.rs:73:17
0x102f569ec - core::ptr::mut_ptr::<impl *mut T>::replace::hcc3425bab485c184
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ptr/mut_ptr.rs:1560:18
0x102f569ec - std::sys::thread_local::lazy::LazyKeyInner<T>::take::h11414c1b64d26b2a
at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/sys/thread_local/mod.rs:102:30
which originated in some register_dtor
functions when trying to run doc-tests
for the same near-sdk-rs
project : https://github.com/rust-lang/rust/issues/126643
It was later confirmed that the issue stopped to manifest itself on exactly the same
rustc 1.79.0 (129f3b996 2024-06-10)
compiler after macos-14-arm64
github actions
vm was updated.
I tried going through the linked issues, but I couldn't figure out how you reproduce the issue (what commands do you run?).
A hypothetical minimal reproducer based on the code you posted doesn't reproduce on my Mac (M2, macOS 14.7):
use std::cell::RefCell;
#[repr(align(16))]
#[derive(Debug)]
struct MockedBlockchain {
_size: [u8; 2512],
}
thread_local! {
static BLOCKCHAIN_INTERFACE: RefCell<MockedBlockchain> = RefCell::new(MockedBlockchain {
_size: [1; 2512],
});
}
fn main() {
BLOCKCHAIN_INTERFACE.with(|b| println!("{b:?}"));
}
So it sounds like it might be something else that is causing the issue?
on unclarified
aarch64-apple-darwin
environment(s)thread_local!
panics on https://doc.rust-lang.org/src/core/ptr/mod.rs.html#1277 .
What do you mean by "unclarified"? Knowing that might help us to reproduce the issue?
@madsmtm you might ask @peitalin from the linked issue about specifics of his environment, where he reproduced it. We didn't reproduce it on our macs either.
The command sequence to run the same scenario which triggered panic would be
# install cargo-near to $CARGO_HOME/bin
curl --proto '=https' --tlsv1.2 -LsSf https://github.com/near/cargo-near/releases/latest/download/cargo-near-installer.sh | sh
cargo near new test
cd test/
cargo test
If the binary installer doesn't work for some reason, it can be installed from source by cargo install cargo-near
@madsmtm the code you tried is much simpler than the one from near-sdk
repo. One from near-sdk
repo uses generic parameter and also some unsafe
to create references to self (without ouroborous). The self-referential part has been there for some odd 3 years, while the generic parameter has been added recently, a couple of months ago.
@peitalin:
-Clinker=/usr/bin/ld
(just to make sure that the problem isn't with a Clang installed via Homebrew or similar)?sw_vers
)? Because then I might be able to whip up a VM for that version.Linking possibly related https://rust-lang.zulipchat.com/#narrow/channel/122651-general/topic/Weird.20crash.20using.20thread.20locals.20Apple.20Silicon.20debug / https://github.com/zama-ai/tfhe-rs/issues/1687.
So I'll also ask:
ld
are you using?@madsmtm I've just tried your example with RUSTFLAGS="-Clinker=/usr/bin/ld" cargo run
and reproduced the errors.
My system specs are:
Xcode version: Version 15.2 (15C500b)
clang++ -v
:
Apple clang version 15.0.0 (clang-1500.0.40.1)
Target: arm64-apple-darwin22.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
$ ld -v
@(#)PROGRAM:ld PROJECT:dyld-1015.6
BUILD 20:37:42 Aug 14 2023
configured to support archs: armv6 armv7 armv7s arm64 arm64e arm64_32 i386 x86_64 x86_64h armv6m armv7k armv7m armv7em
will use ld-classic for: armv6 armv7 armv7s arm64_32 i386 armv6m armv7k armv7m armv7em
LTO support using: LLVM version 15.0.0 (static support for 29, runtime is 29)
TAPI support using: Apple TAPI version 15.0.0 (tapi-1500.0.12.3)
Library search paths:
Framework search paths:
sw_vers
ProductName: macOS
ProductVersion: 13.5.1
BuildVersion: 22G90
I've added a full backtrace of your code snippet and system specs here: https://github.com/peitalin/madsmtm-demo
Nice! I reproduced it now (both in a VM on macOS 13.5, and on macOS 14.7), it's a problem with the linker in Xcode 15.0 and Xcode 15.1, which unfortunately happens to be the latest supported on macOS 13.5.
Copy [this](https://github.com/rust-lang/rust/issues/132704#issuecomment-2466877452) into `foo.rs`. ```sh # Command Line Tools for Xcode 15.0/15.1 # The latest Command Line Tools installation overwrites this path rustc foo.rs -Clinker=/Library/Developer/CommandLineTools/usr/bin/ld && ./foo # Xcode 15.0 rustc foo.rs -Clinker=/Applications/Xcode\ 15.0.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld && ./foo # Xcode 15.1 rustc foo.rs -Clinker=/Applications/Xcode\ 15.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld && ./foo ```
Things do seem to work in Rust 1.78 though, I'll run a bisection, and see if I can find the offending commit (to figure out if this is indeed a regression, and if there's anything we can do about it).
Rust recently gained extra assertions in debug builds to detect Undefined Behavior caused by misaligned pointers. That is likely why this started failing. Already on older versions of Rust, this code had Undefined Behavior, and could be the cause of subtle misbehavior -- newer Rust just makes this misbehavior easier to detect.
If the linker causes some pointer to be null/misaligned, that sounds like a critical bug in that linker. So it'd be good to know exactly which versions of the linker are affected.
Cc @saethlin, your checks found another bug. :)
Nice! I reproduced it now (both in a VM on macOS 13.5, and on macOS 14.7), it's a problem with the linker in Xcode 15.0 and Xcode 15.1, which unfortunately happens to be the latest supported on macOS 13.5.
Is this issue documented/tracked somewhere public? Are there newer linker versions available? I don't know anything about macOS, I am just surprised that an OS released just one year ago wouldn't be able to run the latest version of the linker.
EDIT: Actually https://developer.apple.com/documentation/xcode-release-notes/xcode-15_2-release-notes says that XCode 15.2 should work on macOS 13.5.
It takes quite a lot of time to test these things, Xcode is large ;)
But lemme write down exactly what I've tested so far:
And in between these there's a lot of betas and release candidates.
EDIT: Actually https://developer.apple.com/documentation/xcode-release-notes/xcode-15_2-release-notes says that XCode 15.2 should work on macOS 13.5.
Yeah, I should've clarified that I only really tested the Command Line Tools (and no Command Line Tools for Xcode 15.2 exist).
Is this issue documented/tracked somewhere public?
Not that I know of? Maybe rdar://110340167, though I don't have access to that myself, just guessing based on a similar issue found by googling?
Are there newer linker versions available?
With a newer Xcode, yes. Without, no (not practically, at least).
I don't know anything about macOS, I am just surprised that an OS released just one year ago wouldn't be able to run the latest version of the linker.
Xcode generally only wants to be run on the latest macOS release. Usually it works fine in older versions, but you have to edit a configuration file and run an "unsupported" build, so most people don't.
Also linking possibly related https://github.com/rust-lang/rust/issues/90959.
You're probably right that this is / has been UB for a long time. A bisection led me to a6cdd81eff52566542cecdc1ce381dbe42cf77fb, which sounds very much like a red herring (the only thing of relevance that PR changed is a bit of panic code, which might explain why alignment would change between that and the previous commit).
EDIT: Yeah, definitely a red herring, I've later found previous commits that also exhibit the problem. So not really something that can be bisected.
I'll try to take a look at the difference in assembly between a version that succeeds and one that fails, but it's difficult, since it seems like it requires the standard library to be linked in a certain way (I couldn't trigger the issue with -Zbuild-std
).
Yeah this looks exactly like what I previously poked at on Zulip, it's somewhat comforting that this seems to depend on the Apple Xcode bag of tools.
which sounds very much like a red herring
FWIW, the MIR inliner used to use item hash comparisons to prevent query cycles, which meant that literally any edit to a crate would cause the inliner to make different decisions. The only thing you can do in this scenario is turn off the MIR inliner with RUSTFLAGS=-Zinline-mir=no
or RUSTFLAGS=-Zmir-enable-passes=-Inline
. But of course the standard library in the range you're trying to bisect through has been precompiled with the unstable MIR optimization pipeline.
So you'd need to make -Zbuild-std
work to get a usable bisection. By default, it builds the standard library with the optimization settings for the rest of the build. So if you don't set --release
, you get a quite different standard library build. Are you just using cargo run -Zbuild-std
without --release
?
FWIW, the MIR inliner used to use item hash comparisons to prevent query cycles, which meant that literally any edit to a crate would cause the inliner to make different decisions. The only thing you can do in this scenario is turn off the MIR inliner with
RUSTFLAGS=-Zinline-mir=no
orRUSTFLAGS=-Zmir-enable-passes=-Inline
. But of course the standard library in the range you're trying to bisect through has been precompiled with the unstable MIR optimization pipeline.
That explains a lot, thanks for the tips!
Are you just using
cargo run -Zbuild-std
without--release
?
Yup, couldn't reproduce this issue at all with --release
, but perhaps that's just 'cause I didn't try hard enough.
I tried this code:
https://github.com/near/near-sdk-rs/blob/master/near-sdk/src/environment/mock/mod.rs#L11-L16
I expected to see this happen:
thread_local!
initialization code forRefCell::new(MockedBlockchain::new)
worksInstead, this happened:
on unclarified
aarch64-apple-darwin
environment(s)thread_local!
panics on https://doc.rust-lang.org/src/core/ptr/mod.rs.html#1277 .The issue doesn't reproduce on other
aarch64-apple-darwin
real macs, and onmacos-14-arm64
github actions vm.Original issue: https://github.com/near/near-sdk-rs/issues/1252
Meta
rustc --version --verbose
:backtrace from original issue: https://github.com/near/near-sdk-rs/issues/1252#issuecomment-2454692564