rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.55k stars 12.74k forks source link

`-Zdylib-lto` with ThinLTO is broken on windows-msvc #109114

Open Noratrieb opened 1 year ago

Noratrieb commented 1 year ago

Found by a miscompilation inside the shipped rustc binaries on stable windows-msvc #109067

Here's a partially minimized sample. It can maybe get smaller than this, but after a point it seems to be fairly sensitive to the code shifting around.

use std::ops::Range;
use std::str::Chars;

#[derive(Debug)]
enum EscapeError {
    UnskippedWhitespaceWarning,
}

fn scan_escape(chars: &mut Chars<'_>) -> Result<char, EscapeError> {
    let res = match chars.next().unwrap() {
        _ => panic!("invalid"),
    };
    // Ok(res)
}

fn unescape_str_or_byte_str<F>(src: &str, callback: &mut F)
where
    F: FnMut(Range<usize>, Result<char, EscapeError>),
{
    let mut chars = src.chars();

    // The `start` and `end` computation here is complicated because
    // `skip_ascii_whitespace` makes us to skip over chars without counting
    // them in the range computation.
    while let Some(c) = chars.next() {
        let start = src.len() - chars.as_str().len() - c.len_utf8();
        let res = match c {
            '\\' => {
                match chars.clone().next() {
                    Some('\n') => {
                        // Rust language specification requires us to skip whitespaces
                        // if unescaped '\' character is followed by '\n'.
                        // For details see [Rust language reference]
                        // (https://doc.rust-lang.org/reference/tokens.html#string-literals).
                        skip_ascii_whitespace(&mut chars, start, callback);
                        continue;
                    }
                    _ => scan_escape(&mut chars),
                }
            }
            _ => Ok(c)
        };
        let end = src.len() - chars.as_str().len();
        callback(start..end, res);
    }

    fn skip_ascii_whitespace<F>(chars: &mut Chars<'_>, start: usize, callback: &mut F)
    where
        F: FnMut(Range<usize>, Result<char, EscapeError>),
    {
        let tail = chars.as_str();
        println!("tail={tail:?}");
        let first_non_space = tail
            .bytes()
            .position(|b| b != b' ' && b != b'\t' && b != b'\n' && b != b'\r')
            .unwrap_or(tail.len());
        println!("first_non_space={first_non_space:?} start={start:?}", );
        if tail[1..first_non_space].contains('\n') {
            // The +1 accounts for the escaping slash.
            // let end = start + first_non_space + 1;
            // callback(start..end, Err(EscapeError::MultipleSkippedLinesWarning));
        }
        let tail = &tail[first_non_space..];
        println!("tail={tail:?}");
        if let Some(c) = tail.chars().nth(0) {
            // For error reporting, we would like the span to contain the character that was not
            // skipped. The +1 is necessary to account for the leading \ that started the escape.
            // println!("{:?}", '£'.is_whitespace());
            println!("first char is {c:?}");
            let end = start + first_non_space + c.len_utf8() + 1;
            println!("end is {end:?}");
            if c.is_whitespace() {
                println!("{c:?} is whitespace, err range is {:?}", start..end);
                callback(start..end, Err(EscapeError::UnskippedWhitespaceWarning));
            }
        }
        *chars = tail.chars();
    }
}

fn main() {
    unescape_str_or_byte_str("\\\n£",  &mut |_range, result| {
        eprintln!("cb={result:?}", );
    });
}

Build with rustc -Zdylib-lto -Clto=thin -Cprefer-dynamic=yes -Copt-level=2 main.rs You should see:

tail="\n£"
first_non_space=1 start=0
tail="£"
first char is '£'
end is 4
'£' is whitespace, err range is 0..4
cb=Err(UnskippedWhitespaceWarning)
cb=Ok('£')

Originally posted by @ehuss in https://github.com/rust-lang/rust/issues/109067#issuecomment-1466936353

apiraino commented 1 year ago

WG-prioritization assigning priority (Zulip discussion). Possibly an overly "pessimistic" prioritization label, so feel free to reassess the impact.

@rustbot label -I-prioritize +P-high

lqd commented 1 year ago

The following is enough to trigger the issue (on the latest nightly).

fn main() {
    let mut chars = "£".chars();
    let c = chars.next().unwrap();

    if c.is_whitespace() {
        panic!("{:?} is whitespace", c);
    }
}

rustc -Zdylib-lto -Clto=thin -Cprefer-dynamic=yes src/main.rs && ./main.exe

thread 'main' panicked at ''£' is whitespace', src/main.rs:6:9

It's still sensitive to changes at this point, e.g. adding a println in an else will remove the miscompile.

I'll start working on removing libstd/libcore for an MCVE now.

lqd commented 1 year ago

At opt-level >= 1, a single call is enough.

fn main() {
    if '£'.is_whitespace() {
        panic!("'£' is whitespace");
    }
}

RUSTFLAGS="-Zdylib-lto -Clto=thin -Cprefer-dynamic -Cembed-bitcode -Copt-level=1" cargo +nightly-2023-03-14 run -q

rustc -Zdylib-lto -Clto=thin -Cprefer-dynamic -Copt-level=1 src/main.rs && ./main.exe

is_whitespace references static data in libcore, so inlining that while keeping the regression likely requires having a structure similar to the rustc bin + rustc_driver shared library. Naively doing that unfortunately stops this regression from happening.

dwattttt commented 10 months ago

I'm more familiar with Windows than LLVM/rustc's internals, but there's a few issues I see in a compiled binary based off the reproducers. The comparison operation to check whether a unicode character is whitespace involves indexing a table provided by core/std by the character; in binaries with this miscompilation the comparison ends up referencing the middle of another function, and also does not appear to index by the character. Further, the static whitespace data is marked as needing to be referenced by the built binary (there's an import descriptor for it), but there exists a thunk function for the data; it believes the import is a function to be called.

dwattttt commented 10 months ago

Yep, it's due to imported data being used. It can be reproduced without involving stdlib; you need to produce a dylib dependency that exports data, e.g. Cargo.toml:

crate-type = ["dylib"]

src/main.rs:

pub static EXPORTED_DATA: usize = 0x10203040;

Then use it in another crate .cargo/config:

[target.x86_64-pc-windows-msvc]
rustflags = [
    "-Zdylib-lto",
    "-Clto=thin",
    "-Cprefer-dynamic=yes",
    "-Cembed-bitcode",
]

src/main.rs:

fn main() {
    assert_eq!(0x10203040, dep::EXPORTED_DATA);
}

If this is built with optimisation it inlines the data and does not exhibit the bug.

dwattttt commented 10 months ago

Dumping the LLVM IR gives

> cargo llvm-ir rust_use_dep::main --build-type debug
define internal void @rust_use_dep::main() unnamed_addr #1 {
start:
%_2 = load i32, ptr @_ZN12rust_dyn_dep13EXPORTED_DATA17h4253d5633b678791E, align 4, !noundef !4
call void @ExitProcess(i32 %_2) #5
unreachable
}

_ZN12rust_dyn_dep13EXPORTED_DATA17h4253d5633b678791E is a thunk, it JMP's to the imported address from the dependency, so it definitely thinks the import is a function; if the import was a function this would work. I guess since this is the LLVM IR emitted by rustc the issue is somewhere inside rustc.

bjorn3 commented 10 months ago

https://github.com/rust-lang/rust/blob/e21f4cd98fcf03fb7895f13421699f58132e4beb/compiler/rustc_codegen_llvm/src/consts.rs#L292-L296 seems suspicious. I can't find where we emit the load though.

@dwattttt can you dump the llvm-ir before any optimizations (you need -Cno-prepopulate-passes for this)

dwattttt commented 10 months ago

Adding -Cno-prepopulate-passes to .cargo/config.toml didn't change anything cargo llvm-ir spat out. Building with debug still produced

> cargo llvm-ir rust_use_dep::main --build-type debug
define internal void @rust_use_dep::main() unnamed_addr #1 {
start:
%_2 = load i32, ptr @_ZN12rust_dyn_dep13EXPORTED_DATA17h4253d5633b678791E, align 4, !noundef !4
call void @ExitProcess(i32 %_2) #5
unreachable
}

while building with release propogated the data, so no load occurs:

> cargo llvm-ir rust_use_dep::main
define internal void @rust_use_dep::main() unnamed_addr #4 {
start:
tail call void @ExitProcess(i32 noundef 1020304050) #8
unreachable
}

FTR I changed the source being compiled from an assert to an ExitProcess to try produce a no_std reproducer but got too stuck involving a dylib.

dwattttt commented 10 months ago

I ran a build through a debug logging rustc, I've attached the log. I didn't see anything particularly helpful, but I'm not sure what to look for. build_log.txt

dwattttt commented 10 months ago

This looks relevant: https://github.com/rust-lang/rust/commit/296489c89268e56abb8f6050842d006b16ed4f09

It looks like we're losing a dllimport statement because we're assuming it's static linking somewhere around here?

EDIT: I see it's just what you'd already found. Their issue looks like it matches what we're observing though.

dwattttt commented 10 months ago

Reading through a bunch of the previous issues around, it looks like this issue is coming down to a difference in how dllimport is handled between lld-link.exe & link.exe (at least, as far as I can tell without trying to dig into each tool).

The fix in https://github.com/rust-lang/rust/pull/103353 was applied to stop emitting dllimport during ThinLTO; lld-link.exe was emitting basically the same problematic code structure we see here (https://github.com/rust-lang/rust/issues/81408#issuecomment-1022252418 shows the compiled binary attempting to call what is a data address to load through).

Unfortunately, link.exe is emitting that problematic code structure when we don't emit dllimport in this situation. So the fix for lld-link.exe is causing the same issue in link.exe.

I tried removing self.tcx.sess.lto() != Lto::Thin from the conditional (essentially reverting https://github.com/rust-lang/rust/pull/103353), and link.exe emits a correct load pattern. I also added "-Clinker=lld" to my reproducer, and it also emitted a correct load.

I don't know whether lld-link has changed to cause this issue to disappear, or if it's just been a matter of different optimisations being triggered, so as far as I see our options are:

  1. Remove the "are we doing ThinLTO" clause added by https://github.com/rust-lang/rust/pull/103353, and remove the regression test added.
  2. Add a further check to determine whether we're going to link with lld-link.exe to confine the fix in https://github.com/rust-lang/rust/pull/103353 to just lld-link.exe (assuming we even know what linker will be invoked here?)

Option 1 needs further testing to ensure we're not breaking linking with lld + ThinLTO. There's been a few reports of this issue, so there's a few things we can build & check to get some confidence that it works.

dwattttt commented 10 months ago

Handling dllimport correctly seems to be a complicated topic; https://github.com/rust-lang/rust/issues/27438 has a lot of conversation about the impact of it.