trifectatechfoundation / zlib-rs

A safer zlib
zlib License
139 stars 15 forks source link

UB in `inflate::writer::Writer::copy_chunk_unchecked()` #232

Closed inahga closed 4 weeks ago

inahga commented 1 month ago

Fuzzing under Miri reveals a UB bug in inflate::writer::Writer::copy_chunk_unchecked() under a variety of inputs.

I've attached one file that triggers this, along with the test to reproduce:

#[cfg(test)]
mod tests {
    #[cfg(miri)]
    use {
        crate::run,
        libfuzzer_sys::arbitrary::Unstructured,
        libz_rs_sys::{inflate, inflateEnd, inflateInit2_, z_stream, zlibVersion},
        rstest::rstest,
        std::{fs::File, io::Read, path::PathBuf},
        zlib_rs::{InflateFlush, ReturnCode},
    };

    #[test]
    #[cfg(miri)]
    fn inahga() {
        let mut file =
            File::open("corpus/uncompress2/014855234084d1fa5bd96114988af519ededaa07.gz").unwrap();
        let mut input = Vec::new();
        file.read_to_end(&mut input).unwrap();

        let mut stream = z_stream::default();

        let err = unsafe {
            inflateInit2_(
                &mut stream,
                15 + 32,
                zlibVersion(),
                size_of::<z_stream>() as _,
            )
        };
        assert_eq!(ReturnCode::from(err), ReturnCode::Ok);

        let mut output = vec![0; input.len()];
        let input_len: u64 = input.len().try_into().unwrap();
        stream.next_out = output.as_mut_ptr();
        stream.avail_out = output.len().try_into().unwrap();
        stream.next_in = input.as_ptr();
        stream.avail_in = input.len().try_into().unwrap();

        while input_len.checked_sub(stream.total_in).unwrap() > 0 {
            let err = unsafe { inflate(&mut stream, InflateFlush::Finish as _) };
            match ReturnCode::from(err) {
                ReturnCode::StreamEnd => {
                    break;
                }
                ReturnCode::BufError | ReturnCode::Ok => {
                    let add_space: u32 = Ord::max(1024, output.len().try_into().unwrap());
                    output.extend(core::iter::repeat_n(0, add_space.try_into().unwrap()));

                    // If extend() reallocates, it may have moved in memory.
                    stream.next_out = output.as_mut_ptr();
                    stream.avail_out += add_space;
                }
                err => {
                    panic!("{:?}", err);
                }
            }
        }

        unsafe { inflateEnd(&mut stream) };
    }
}

014855234084d1fa5bd96114988af519ededaa07.gz

error: Undefined Behavior: constructing invalid value at .0[1]: encountered uninitialized memory, but expected an integer
    --> /home/inahga/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/../../stdarch/crates/core_arch/src/x86/sse2.rs:1236:5
     |
1236 |     dst
     |     ^^^ constructing invalid value at .0[1]: encountered uninitialized memory, but expected an integer
     |
     = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
     = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
     = note: BACKTRACE on thread `tests::inahga`:
     = note: inside `std::arch::x86_64::_mm_loadu_si128` at /home/inahga/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/../../stdarch/crates/core_arch/src/x86/sse2.rs:1236:5: 1236:8
     = note: inside `<std::arch::x86_64::__m128i as zlib_rs::inflate::writer::Chunk>::load_chunk` at /home/inahga/Projects/zlib-rs/zlib-rs/src/inflate/writer.rs:286:9: 286:57
     = note: inside `zlib_rs::inflate::writer::Writer::<'_>::copy_chunk_unchecked::<std::arch::x86_64::__m128i>` at /home/inahga/Projects/zlib-rs/zlib-rs/src/inflate/writer.rs:234:25: 234:43
     = note: inside `zlib_rs::inflate::writer::Writer::<'_>::extend_from_window_help::<std::arch::x86_64::__m128i>` at /home/inahga/Projects/zlib-rs/zlib-rs/src/inflate/writer.rs:120:17: 120:101
     = note: inside `zlib_rs::inflate::writer::Writer::<'_>::extend_from_window` at /home/inahga/Projects/zlib-rs/zlib-rs/src/inflate/writer.rs:87:20: 87:94
     = note: inside `zlib_rs::inflate::inflate_fast_help` at /home/inahga/Projects/zlib-rs/zlib-rs/src/inflate.rs:1714:29: 1714:88
     = note: inside `zlib_rs::inflate::State::<'_>::len` at /home/inahga/Projects/zlib-rs/zlib-rs/src/inflate.rs:1110:20: 1110:46
     = note: inside `zlib_rs::inflate::State::<'_>::match_` at /home/inahga/Projects/zlib-rs/zlib-rs/src/inflate.rs:1311:13: 1311:23
     = note: inside `zlib_rs::inflate::State::<'_>::dispatch` at /home/inahga/Projects/zlib-rs/zlib-rs/src/inflate.rs:538:28: 538:41
     = note: inside `zlib_rs::inflate::inflate` at /home/inahga/Projects/zlib-rs/zlib-rs/src/inflate.rs:1970:19: 1970:35
     = note: inside `libz_rs_sys::inflate` at /home/inahga/Projects/zlib-rs/libz-rs-sys/src/lib.rs:321:9: 321:49

Ran under MIRIFLAGS="-Zmiri-disable-isolation -Zmiri-tree-borrows" cargo miri nextest run inahga. This is one such input that invokes the UB. It's invoked from match_() -> len(), but I've seen it triggered elsewhere, basically anywhere extend_from_window() gets invoked. Let me know if you'd like more inputs to play with, though you can also generate some inputs and run them under Miri by running https://github.com/trifectatechfoundation/zlib-rs/pull/231,

This only occurs on the main branch.

From some dbg!()ing, I think this happens when we're copying chunks of data close to the end of the input. The last chunk of data is often not a full chunk, so we copy over uninitialized bytes.

I see we have logic in inflate/window.rs:130 to add some padding such that these reads don't read uninitialized bytes, but that logic isn't actually hit in this case. It's conditional on update_checksum, which is only true when wrap & 4 == 0. Maybe this logic was required for SIMD checksumming, but now it needs to be unconditional for SSE copies? Indeed when I make it unconditional, Miri no longer complains, but I'm not sure what else that breaks :wink:

We may also want to document the precondition that src..src.add(length) all consists of initialized bytes, in copy_chunk_unchecked() (or perhaps even have it operate on *mut u8, since load_chunk() will always invoke UB if used on uninitialized bytes).

folkertdev commented 1 month ago

just as a note, instead of using repeat_n, this is probably faster and clearer

output.resize(output.len() + add_space as usize, 0);

and also this works with our MSRV (repeat_n just got stabilized in 1.82.0)