rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.1k stars 12.69k forks source link

rustc 1.18.0 misaligned loads and stores on SPARC #43346

Closed dhduvall closed 7 years ago

dhduvall commented 7 years ago

My apologies for not catching this sooner.

The struct alignment and packing that landed (well, was re-enabled) with 1.18.0 produces either improperly aligned structs or improperly aligned loads and stores (or both) on SPARC. The following test program demonstrates it (extracted from where I originally saw the compiler die):

pub struct Blake2bCtx {
    a: [u8; 8],
    b: u16,
    c: u8,
}

#[inline(never)]
fn blake2b_compress(ctx: &mut Blake2bCtx) {

    // Re-interpret the input buffer in the state as an array of little-endian
    // u64s, converting them to machine endianness. It's OK to modify the
    // buffer in place since this is the last time this data will be accessed
    // before it's overwritten.

    let m: &mut [u64; 1] = unsafe {
        let t: &mut [u8; 8] = &mut ctx.a;
        ::std::mem::transmute(t)
    };

    for word in &mut m[..] {
        *word = *word + 1;
    }
}

fn main() {
    let mut ctx = Blake2bCtx {
        a: [0; 8],
        b: 0,
        c: 0,
    };

    blake2b_compress(&mut ctx)
}

The disassembly looks like this:

_ZN5blake16blake2b_compress17h54f2249749134ad6E()
    blake::blake2b_compress::h54f2249749134ad6:      9d e3 bf 80  save      %sp, -0x80, %sp
    blake::blake2b_compress::h54f2249749134ad6+0x4:  f2 5e 20 02  ldx       [%i0 + 0x2], %i1
    blake::blake2b_compress::h54f2249749134ad6+0x8:  b2 06 60 01  add       %i1, 0x1, %i1
    blake::blake2b_compress::h54f2249749134ad6+0xc:  f2 76 20 02  stx       %i1, [%i0 + 0x2]
    blake::blake2b_compress::h54f2249749134ad6+0x10: 81 c7 e0 08  ret
    blake::blake2b_compress::h54f2249749134ad6+0x14: 81 e8 00 00  restore

Note the ldx of [%i0 + 0x2]. That would be valid if %i0 were itself misaligned, but it's not, and ldx requires an 8-byte alignment. I can attach the entire IR if necessary, but I believe these are the relevant bits:

; ModuleID = 'blake.cgu-0.rs'
source_filename = "blake.cgu-0.rs"
target datalayout = "E-m:e-i64:64-n32:64-S128"
target triple = "sparcv9-sun-solaris"

%Blake2bCtx = type { i16, [0 x i8], [8 x i8], [0 x i8], i8, [1 x i8] }
%"core::slice::IterMut<u64>" = type { i64*, [0 x i8], i64*, [0 x i8], %"core::marker::PhantomData<&mut u64>", [0 x i8] }
%"core::marker::PhantomData<&mut u64>" = type {}

@__rustc_debug_gdb_scripts_section__ = internal unnamed_addr constant [34 x i8] c"\01gdb_load_rust_pretty_printers.py\00", section ".debug_gdb_scripts", align 1

; Function Attrs: noinline nounwind uwtable
define internal fastcc void @_ZN5blake16blake2b_compress17h54f2249749134ad6E(%Blake2bCtx* dereferenceable(12)) unnamed_addr #0 !dbg !5 {
bb6:
  tail call void @llvm.dbg.value(metadata %Blake2bCtx* %0, i64 0, metadata !23, metadata !57), !dbg !58
  tail call void @llvm.dbg.value(metadata %Blake2bCtx* %0, i64 0, metadata !24, metadata !57), !dbg !59
  %1 = getelementptr inbounds %Blake2bCtx, %Blake2bCtx* %0, i64 0, i32 2, !dbg !60
  %2 = bitcast [8 x i8]* %1 to i64*, !dbg !61
  tail call void @llvm.dbg.value(metadata %"core::slice::IterMut<u64>"* undef, i64 0, metadata !39, metadata !62), !dbg !63
  tail call void @llvm.dbg.value(metadata i64* %2, i64 0, metadata !54, metadata !57), !dbg !64
  %3 = load i64, i64* %2, align 8, !dbg !65
  %4 = add i64 %3, 1, !dbg !65
  store i64 %4, i64* %2, align 8, !dbg !65
  tail call void @llvm.dbg.value(metadata %"core::slice::IterMut<u64>"* undef, i64 0, metadata !39, metadata !62), !dbg !63
  ret void, !dbg !66
}

I don't see anything obviously wrong in the code, but my knowledge here is pretty slim, so if it's obvious to someone else, I'd love the help. Otherwise, I'm probably going to be putting debugging statements into rustc::ty::layout::Struct::new, hope I'm poking around in the right area, and doing a lot of recompiling.

cc @binarycrusader

jonas-schievink commented 7 years ago

You're transmuting from a u8 array, which has no alignment requirements, to a u64 array, which has an alignment requirement of 8. There was no guarantee that this would work, even before that patch, since the Blake2bCtx struct only has an alignment of 2 due to the u16 inside.

dhduvall commented 7 years ago

Perhaps I failed to trim down blake2b.rs correctly, but this is the same error I'm seeing in the compiler, out of https://github.com/rust-lang/rust/blob/master/src/librustc_data_structures/blake2b.rs

Though there the loads are more complex:

    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x14:  ca 5e 00 00  ldx       [%i0], %g5
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x18:  c8 5e 20 08  ldx       [%i0 + 0x8], %g4
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x1c:  c6 5e 20 10  ldx       [%i0 + 0x10], %g3
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x20:  fa 5e 20 18  ldx       [%i0 + 0x18], %i5
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x24:  e0 5e 20 20  ldx       [%i0 + 0x20], %l0
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x28:  f8 5e 20 28  ldx       [%i0 + 0x28], %i4
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x2c:  f6 5e 20 30  ldx       [%i0 + 0x30], %i3
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x30:  d2 5e 20 5a  ldx       [%i0 + 0x5a], %o1
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x34:  c4 5e 20 38  ldx       [%i0 + 0x38], %g2
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x38:  e2 5e 20 40  ldx       [%i0 + 0x40], %l1
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x3c:  d0 5e 20 48  ldx       [%i0 + 0x48], %o0

(See the +0x5a which is where we get the SIGBUS.)

jonas-schievink commented 7 years ago

Ah, ok. That would have worked before the patch, of course.

dhduvall commented 7 years ago

Yes. Or by using -Z fuel=<mod>=0.

dhduvall commented 7 years ago

It doesn't look like I can use RUSTFLAGS to inject -Z fuel=rustc_data_structures::blake2b=0, since my bootstrapping compiler is 1.17.0 and doesn't understand that.

I reimplemented the transmute() in blake2b_compress with a pair of for loops and lots of shifting and oring, and the compiler works again, even if it uses a lot more instructions doing so (and blake2b_selftest() passes). If there's a faster way that's akin to transmute(), but works, I've no idea. I looked at the various alternatives in the transmute() docs, but when I got a warning about a non-scalar cast, I gave up. I don't know how performance-critical this is, anyway.

I could, I suppose, build a patched 1.18.0 and use that for the bootstrap so that the above RUSTFLAGS workaround could then work to build a good compiler against the existing sources, but I keep worrying about the use of -Z going away, and don't really want to depend on that.

I'll leave this open until someone else wants to pick it up. If anyone thinks it'd be worthwhile for me to simply submit a PR with my slow code, and get it code-reviewed into existence as folks have time, I'm happy to do that.

ollie27 commented 7 years ago

I think the simplest workaround would be to add #[repr(C)] to Blake2bCtx. If you can confirm that that works then it would be worth submitting that. There's probably a better way of fixing this though like changing the type of Blake2bCtx.b to [u64; 16] because transmuting from &mut [u64; 16] to &mut [u8; 128] should be safe.

dhduvall commented 7 years ago

Indeed, #[repr(C)] does the trick.