rust-lang / flate2-rs

DEFLATE, gzip, and zlib bindings for Rust
https://docs.rs/flate2
Apache License 2.0
862 stars 159 forks source link

Tree borrows violation occurs when using zlib backend #392

Closed icmccorm closed 5 months ago

icmccorm commented 6 months ago

I've been experimenting with a version of Miri that can execute foreign functions by interpreting the LLVM bytecode that is produced during a crate's build process.

Miri found an instance of undefined behavior in flate2-rs at version 1.0.28. There seems to be a tree borrows violation in the constructors for ZlibEncoder and ZlibDecoder. I encountered this error when running several test cases for another crate, twmap, which depends on flate2. Here’s the full error message that occurs when running the test case custom_test_teeworlds07 from that crate

---- Foreign Error Trace ----

@ %75 = load i32, ptr %74, align 8, !dbg !955

libz-sys-1.1.12/src/zlib/deflate.c:1519:22
libz-sys-1.1.12/src/zlib/deflate.c:1941:13
libz-sys-1.1.12/src/zlib/deflate.c:1003:18
flate2-1.0.28/src/ffi/c.rs:316:27: 316:58
-----------------------------

error: Undefined Behavior: read access through <133277> is forbidden
   |
   = note: read access through <133277> is forbidden
   = help: the accessed tag <133277> is a child of the conflicting tag <133273>
   = help: the conflicting tag <133273> has state Disabled which forbids this child read access
help: the accessed tag <133277> was created here
  --> .../twmap/src/compression.rs:31:23
   |
31 |     let mut encoder = ZlibEncoder::new(Vec::new(), Compression::default());
   |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: the conflicting tag <133273> was created here, in the initial state Reserved
  --> .../twmap/src/compression.rs:31:23
   |
31 |     let mut encoder = ZlibEncoder::new(Vec::new(), Compression::default());
   |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: the conflicting tag <133273> later transitioned to Disabled due to a foreign write access at offsets [0x8..0xc]
  --> .../twmap/compression.rs:32:5
   |
32 |     encoder.write_all(data).unwrap();
   |     ^^^^^^^^^^^^^^^^^^^^^^^
   = help: this transition corresponds to a loss of read and write permissions

It seems like zlib creates a cyclic structure on lines 306-307 in deflate.c within the body of the function deflateInit2_.

int ZEXPORT deflateInit2_(z_streamp strm, int level, int method,
                          int windowBits, int memLevel, int strategy,
                          const char *version, int stream_size) {
    ...
    strm->state = (struct internal_state FAR *)s;
    s->strm = strm;
    ...
}

The parameter strm is a given as a mutable borrow from Rust, on line 277 of c.rs in flate2

fn make(level: Compression, zlib_header: bool, window_bits: u8) -> Self {
    unsafe {
        let mut state = StreamWrapper::default();
        let ret = mz_deflateInit2(
            &mut *state,
            ...
        );
        ...
    }
}

If state is mutated through a Deflate or Inflate object before the pointer stored in strm->state is used, then I think that this pointer would become "Disabled," leading to the invalid read.

Byron commented 6 months ago

Thanks a lot for investigating this issue and sharing your findings!

It seems like it would be impossible to find or see this without tooling, and even with your fantastic explanation I am still having trouble to truly understand what's going on, or how to fix this.

However, I hope that doesn't prevent anyone from contributing a fix.

I also wonder how feasible it would be to run our own test suite under Miri - maybe it would also find this issue, or find even more?

icmccorm commented 6 months ago

I was able to recreate this issue in the test deflate::tests::drop_writes. I'll file new issues if I find any other bugs in flate2's test suite.

I think the main problem here is that the interior of StreamWrapper is a Box, which provides a unique aliasing guarantee.

pub struct StreamWrapper {
    pub inner: Box<mz_stream>,
}

One possible solution is to leak the Box in StreamWrapper::default() and make inner a raw pointer. This appears to fix the issue.

Byron commented 6 months ago

That's great to hear! Is this Miri failure be something that could be reproduced on CI as well? The reason I ask is that these issues seem so subtle, I doubt they will stay fixed for long even if all of them would be miraculously fixed today.

One possible solution is to leak the Box in StreamWrapper::default() and make inner a raw pointer. This appears to fix the issue.

Thanks for investigating a solution! Would this leak have the possibility of being amplified? Even if only 16 bytes are leaked, or less, over time the loss will be considerable and might cause long-running processed to run out of memory. Thus I don't know how feasible such a fix would really be in a crate like this.

icmccorm commented 6 months ago

Would this leak have the possibility of being amplified?

We'd just need to implement Drop for StreamWrapper and use Box::from_raw.

Is this Miri failure something that could be reproduced on CI as well?

My extension to Miri is pretty unstable and requires a separate toolchain, so I wouldn't recommend running it in CI. I'm running an evaluation on a bunch of Rust crates that use the FFI, and I'll be publishing my results, but I don't anticipate that my extension will turn into something that's production-ready. However, the Krabcake project is building an equivalent tool. Once development on that is further along, then you'll be able to run Valgrind in CI to keep these errors from popping up again.

Byron commented 6 months ago

Ah, right, I didn't realize it's a custom, bleeding edge extension and agree that CI probably shouldn't rely on it just yet. I also hope that memory profiling tools will become more readily available for everybody to use to make unsafe a little safer.