jetscii returns incorrect result in mysterious circumstances

brson commented 6 years ago

I've reduced this as much as I can, but it disappears if I reduce it any further.

The code is here: https://github.com/brson/searchtest/blob/jetsciibug/src/lib.rs#L16

On the jetsciibug branch of my searchtest repo. The bug shows up in a benchmark, where jetscii returns the wrong result. This happens on both beta and nightly, the versions of which are printed below. This is using RUSTC_BOOTSTRAP to test benchmarking on beta.

This happens with target-cpu=x86_64 but not with target-cpu=skylake. I haven't tested any other targets.

brian@DESKTOP-UCV672I:~/searchtest⟫ RUSTC_BOOTSTRAP=1 RUSTFLAGS="-C target-cpu=x86-64" cargo +beta bench -- --nocapture
    Finished release [optimized] target(s) in 0.03s
     Running target/release/deps/searchtest-ae1a210660c35b0a

running 1 test
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `Some(64)`,
 right: `Some(419)`', src/lib.rs:20:9
note: Run with `RUST_BACKTRACE=1` for a backtrace.
test find_from_set_jetscii_bytes_lipsum_emph ... FAILED

failures:

failures:
    find_from_set_jetscii_bytes_lipsum_emph

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

error: bench failed
101 brian@DESKTOP-UCV672I:~/searchtest⟫ rustc +beta --version
rustc 1.27.0-beta.5 (84b5a46f8 2018-05-15)
brian@DESKTOP-UCV672I:~/searchtest⟫ RUSTC_BOOTSTRAP=1 RUSTFLAGS="-C target-cpu=x86-64" cargo +nightly bench -- --nocapture
   Compiling searchtest v0.1.0 (file:///home/brian/searchtest)
    Finished release [optimized] target(s) in 1.75s
     Running target/release/deps/searchtest-191788c2bec2b787

running 1 test
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `Some(64)`,
 right: `Some(419)`', src/lib.rs:20:9
note: Run with `RUST_BACKTRACE=1` for a backtrace.
test find_from_set_jetscii_bytes_lipsum_emph ... FAILED

failures:

failures:
    find_from_set_jetscii_bytes_lipsum_emph

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out

error: bench failed
101 brian@DESKTOP-UCV672I:~/searchtest⟫ rustc +nightly --version
rustc 1.27.0-nightly (f0fdaba04 2018-05-15)

brson commented 6 years ago

The bug goes away if I reduce the set of search chars below 14.

brson commented 6 years ago

I'm hoping either your or I can figure out how to reduce this for an upstream bug report. Seems like it could be a rust codegen bug.

brson commented 6 years ago

Ok, now I've seen it even on skylike with a different mixture of benchmarks.

brson commented 6 years ago

The bug seems to go away if I pass '-C incremental=0'.

cc @michaelwoerister This library leverages the new simd instructions and sometimes returns the wrong results. It seems if I pass "RUSTFLAGS=-C target-cpu=x86_64" it always fails, and if I pass `RUSTFLAGS=-C target=cpu=x86_64 -C incremental=0" it always succeds, but it's been a bit of a heisenbug.

brson commented 6 years ago

But if I pass incremental=1 it also succeeds. Maybe I don't know the correct values or incremental. /me definitely confused.

brson commented 6 years ago

incremental does seem to make it work consistently.

michaelwoerister commented 6 years ago

Hey @brson! Incremental compilation is best controlled via the CARGO_INCREMENTAL env var (which can be set to either 0 or 1). If you need to use rustc directly then -Cincremental takes a path to the cache directory to use (e.g. /tmp/xyz). If you passed 1 or 0, you probably have a cache directory called "1" or "0" somewhere now.

This seems to be some kind of miscompilation. We've had a few cases like this lately related to ThinLTO, which incremental disables by default. You could try compiling with -C codegen-units=1 (which also disables ThinLTO) to see if it has to do with incremental compilation specifically.

brson commented 6 years ago

@michaelwoerister ah thanks for the help. Indeed, codegen-units=1 makes the problem reliably disappear; codegen-units=6 makes it reliably appear.

brson commented 6 years ago

So the working command I'm using is

RUSTFLAGS="-C target-cpu=x86-64 -C codegen-units=1" cargo +nightly bench

And the broken command is

RUSTFLAGS="-C target-cpu=x86-64 -C codegen-units=6" cargo +nightly bench

And with codegen-units=6, changing the target-cpu to skylake doesn't fix the problem per my original report.

brson commented 6 years ago

cc @alexcrichton this is the codegen-units miscompilation I talked about on irc. The branch that triggers the problem is in the op, and my previous comment shows the working and non-working cargo invocations.

alexcrichton commented 6 years ago

Hm with rustc 1.27.0-nightly (2f2a11dfc 2018-05-16) I'm unable to reproduce with the two above commands using https://github.com/brson/searchtest/commit/76682a939e3becbefaa088304f223f3a58ca28f9

alexcrichton commented 6 years ago

(on Ubuntu 16.04 as well)

brson commented 6 years ago

Hm :/ I can repro on the same nightly as you, and I can also repro with rustc 1.27.0-beta.5 (84b5a46f8 2018-05-15) (setting RUSTC_BOOTSTRAP).

I'm on Ubuntu 16.04.4 on WSL.

brson commented 6 years ago

I can't reproduce it on the MSVC nightly...

alexcrichton commented 6 years ago

Hm FWIW I can also reproduce on WSL (same version myself). I can't reproduce it anywhere else though? I wonder if this is a WSL weird bug?

jonathanstrong commented 6 years ago

I'm experiencing something like this at the moment, where tests are passing or failing at seemingly random. I can't identify any pattern or example that reliably fails, or compilation flags that change it. I've been writing more tests after first experiencing it, to troubleshoot, and it seems almost as if adding or subtracting code can trigger different runtime behavior. Obviously I have no clue how that could be the case. Fwiw I have memchr and an iterator-based find-like function (on &[u8]) to compare with verify the results from bytes! is erroneous.

Edit: using jetscii master on Ubuntu 16.04 w/ an i7-3630QM

shepmaster commented 6 years ago

@jonathanstrong to clarify...

are you using the Windows Subsystem for Linux (WSL)?
or compilation flags that change it — including "-C target-cpu and -C codegen-units that @brson mentioned?
Which version of Rust are you using?

One thing you could try is to download this repo and run the tests. There's a quickcheck test that validates the SIMD behavior against the baseline. You can run with QUICKCHECK_TESTS set to a "big" number to just let it go for a long time.

jetscii master

I've released 0.4.1, sorry for the delay!

jonathanstrong commented 6 years ago

No windows involved, Ubuntu 16.04. rustc 1.28.0-nightly (e3bf634e0 2018-06-28). (I was using master for Substring, glad to see it's released! Unfortunately I gave up after a few hours of perplexing errors and just used memchr and std's .find. Here's an example of the type of code that was causing this:

fn find_subsequence(haystack: &[u8], needle: &[u8]) -> Option<usize> {
    haystack.windows(needle.len()).position(|window| window == needle)
}

#[test]
fn parse_json() {
    let json = br#"{"type":"one","user_id":"cb0c70d7-e060-47b9-abb4-83d310a1fec8","action":"cancel","product_id":"ABC-123","price":"29.50","sequence":9876543}"#;
    let product_id = jetscii::ByteSubstring::new(b"\"product_id\":\"");
    let sequence = jetscii::ByteSubstring::new(b"\"sequence\":");

    let pid: &[u8] = product_id.find(json).and_then(|i| {
        println!("json[(i+14)..] = {} ; i = {}", unsafe { ::std::str::from_utf8_unchecked(&json[(i+14)..]) }, i);

        bytes!(b'"').find(&json[(i + 14)..]).map(|j| {
            let k = find_subsequence(&json[(i+14)..], b"\"");
            let l = memchr::memchr(b'"', &json[(i+14)..]);
            println!("json[(i+14)..(i+14+j)] = {} ; i = {}, j = {}, find_subsequence(..) = {:?}, memchr(..) = {:?}",
                     unsafe { ::std::str::from_utf8_unchecked(&json[(i+14)..(i+14+j)]) }, i, j, k, l);
            &json[(i + 14)..(i + 14 + j)]
        })
    }).unwrap();
    assert_eq!(pid, b"ABC-123", "pid = {}", unsafe { ::std::str::from_utf8_unchecked(pid) });

    let seq: Option<u64> = sequence.find(json).and_then(|i| {
        println!("json[(i+11)..] = {} ; i = {}", unsafe { ::std::str::from_utf8_unchecked(&json[(i+11)..]) }, i);
        bytes!(b',', b'}').find(&json[(i + 11)..]).and_then(|j| {
            atoi::atoi(&json[(i + 11)..(i + 11 + j)])
        })
    });
    assert_eq!(seq, Some(9876543));
}

As you can see, in the closure following the first call to bytes! I calculated the same thing with find_subsequence and memchr for troubleshooting, both of those functions were returning the same answer (Some(7) iirc) while bytes! was returning something way off (Some(64) iirc). And again, the other strange thing was that the same exact tests, with hardcoded text (like the example above), would sometimes pass, sometimes fail, on different runs (differently compiled runs).

I had cloned and run the tests last night, they all passed. Just ran $ QUICKCHECK_TESTS=1000000 cargo test and it also passed.

shepmaster commented 6 years ago

This is looking less and less like a compiler bug. I've extended my property-based tests and now I've got the following code which pretty reliably reproduces the problem:

const AAAA: u8 = 0x01;
const BBBB: u8 = 0x02;

let needle = Needle {
    data: [
        AAAA, BBBB, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    ],
    len: 2,
};
let haystack = Haystack {
    data: vec![
        AAAA, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        BBBB, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00,
    ],
    start: 1,
};

let haystack = haystack.with_start();

let us = unsafe { Bytes::new(needle.data, needle.len as i32).find(haystack) };
assert_eq!(Some(15), us);

It's strange because removing any of those trailing bytes in the haystack cause the test to pass. I'm sure I've got some exciting edge case somewhere...

shepmaster commented 6 years ago

Ok, I have a fix up in #23. The TL;DR:

The PCMPxSTRx instructions operate on a 16-byte wide vector. When the input data isn't aligned to a 16-byte boundary, we use a variant of the instruction which returns a mask of all matches for the first 16-byte aligned block. If a false positive occurs in the leading junk of that block, we handle that by shifting the mask data.

MXXX_XXXX_XXXX_XXXX
^ matches      ^
               | but data starts here

This case would return a 0b0000_0000_0000_0001, which we would then right-shift by 12 to ignore the leading junk. This will generate a 0, which we then ask the number of leading zeros, which is 64 due to the size of the type. This is what generated the Some(64) in both reported cases.

This was partially hidden because we already had a half of a check for this, as we checked to make sure that the returned index was less than the length of the haystack (to prevent matching after the haystack).

The new property-based test checks misaligned input haystacks more thoroughly and a separate unit test was added as well.

shepmaster commented 6 years ago

Just waiting on some performance checks to make sure an extra conditional won't be a massive perf loss.

jonathanstrong commented 6 years ago

this appeared to fix the problem in at least one of the places I was encountering this.

jonathanstrong commented 6 years ago

across the board, no unusual errors using the "misalignment-false-positive" branch. much better!

shepmaster / jetscii

jetscii returns incorrect result in mysterious circumstances #19