image-rs / weezl

LZW en- and decoding that goes weeeee!
Apache License 2.0
26 stars 7 forks source link

Check for overflow of `next_code` when adding `burst_size` #30

Closed shutton closed 2 years ago

shutton commented 2 years ago

This addresses a problem in the GIF parser detected during fuzzing of a crate that utilizes image-rs. The issue can be reproduced using this short test program and the attached file (too large to include in the source -- unzip first).

const BAD_GIF: &[u8] = include_bytes!(concat!(
    env!("CARGO_MANIFEST_DIR"),
    "/5220731288420352.gif"
));

fn main() {
    if let Err(e) = image::load_from_memory(BAD_GIF) {
        eprintln!("failed to load: {e}");
    }
}

5220731288420352.gif.zip

After the fix, loading the malformed image produces a usable Err result:

Format error decoding Gif: unexpected EOF
HeroicKatora commented 2 years ago

I've thrown together a patch for to put in debug_asserts for the actual invariants that the code is working under:

Patch file ```diff From ac575ce26fb081883092536e0fcbf00c2af59cc2 Mon Sep 17 00:00:00 2001 From: Andreas Molzer Date: Tue, 19 Apr 2022 21:29:13 +0200 Subject: [PATCH] Add debug assertions on internal invariants --- src/decode.rs | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/src/decode.rs b/src/decode.rs index 283e31f..49f3bfd 100644 --- a/src/decode.rs +++ b/src/decode.rs @@ -711,7 +711,7 @@ impl Stateful for DecodeState { Some(tup) => { status = Ok(LzwStatus::Ok); code_link = Some(tup) - }, + } }; // Track an empty `burst` (see below) means we made no progress. @@ -827,6 +827,7 @@ impl Stateful for DecodeState { // the case of requiring an allocation (which can't occur in practice). let new_link = self.table.derive(&link, cha, code); self.next_code += 1; + debug_assert!(self.next_code as usize <= MAX_ENTRIES); code = burst; link = new_link; } @@ -918,6 +919,8 @@ impl Stateful for DecodeState { } self.next_code += 1; + debug_assert!(self.next_code as usize <= MAX_ENTRIES); + new_link = link; } else { // It's actually quite likely that the next code will be a reset but just in case. @@ -1203,6 +1206,13 @@ impl Table { } fn derive(&mut self, from: &Link, byte: u8, prev: Code) -> Link { + debug_assert!( + self.inner.len() < MAX_ENTRIES, + "Invalid code would be created {:?} {} {:?}", + from.prev, + byte, + prev + ); let link = from.derive(byte, prev); let depth = self.depths[usize::from(prev)] + 1; self.inner.push(link.clone()); -- 2.35.1 ```

The trace of running decoding with those suggest that the comparison itself relies on an incorrect assumption. Since it uses == it relies on self.next_code <= self.code_buffer.max_code() but that doesn't hold. When we reach 12-bits then the code buffer does not get larger and max_code() remains at 4095. At the same time next_code will advance to 4096, and never beyond in the sequential code path, a code that will never be created and thus works correctly with the rest of the logic.

But when that is the exact moment that we enter a burst, as is the case with the provided file, then it will advance next_code beyond that and not notice that the maximum code has been reached. An easy fix would be to adjust the condition:

if potential_code >= self.code_buffer.max_code() - Code::from(self.is_tiff) {

I'll measure if that leads to too much of a performance loss due to executing less of the simple code reconstruction.