image-rs / weezl

LZW en- and decoding that goes weeeee!
Apache License 2.0
26 stars 7 forks source link

Invalid codes being created during decode: `debug_asserts` for invariants #31

Open HeroicKatora opened 2 years ago

HeroicKatora commented 2 years ago

These codes are inserted into the table, but can't be used or referenced in the code text. They are just 'waste'.

I've thrown together a patch for to put in debug_asserts for the actual invariants that the code is working under:

Patch file ```diff From ac575ce26fb081883092536e0fcbf00c2af59cc2 Mon Sep 17 00:00:00 2001 From: Andreas Molzer Date: Tue, 19 Apr 2022 21:29:13 +0200 Subject: [PATCH] Add debug assertions on internal invariants --- src/decode.rs | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/src/decode.rs b/src/decode.rs index 283e31f..49f3bfd 100644 --- a/src/decode.rs +++ b/src/decode.rs @@ -711,7 +711,7 @@ impl Stateful for DecodeState { Some(tup) => { status = Ok(LzwStatus::Ok); code_link = Some(tup) - }, + } }; // Track an empty `burst` (see below) means we made no progress. @@ -827,6 +827,7 @@ impl Stateful for DecodeState { // the case of requiring an allocation (which can't occur in practice). let new_link = self.table.derive(&link, cha, code); self.next_code += 1; + debug_assert!(self.next_code as usize <= MAX_ENTRIES); code = burst; link = new_link; } @@ -918,6 +919,8 @@ impl Stateful for DecodeState { } self.next_code += 1; + debug_assert!(self.next_code as usize <= MAX_ENTRIES); + new_link = link; } else { // It's actually quite likely that the next code will be a reset but just in case. @@ -1203,6 +1206,13 @@ impl Table { } fn derive(&mut self, from: &Link, byte: u8, prev: Code) -> Link { + debug_assert!( + self.inner.len() < MAX_ENTRIES, + "Invalid code would be created {:?} {} {:?}", + from.prev, + byte, + prev + ); let link = from.derive(byte, prev); let depth = self.depths[usize::from(prev)] + 1; self.inner.push(link.clone()); -- 2.35.1 ```

The trace of running decoding with those suggest that the comparison itself relies on an incorrect assumption. Since it uses == it relies on self.next_code <= self.code_buffer.max_code() but that doesn't hold. When we reach 12-bits then the code buffer does not get larger and max_code() remains at 4095. At the same time next_code will advance to 4096, and never beyond in the sequential code path, a code that will never be created and thus works correctly with the rest of the logic.

But when that is the exact moment that we enter a burst, as is the case with the provided file, then it will advance next_code beyond that and not notice that the maximum code has been reached. An easy fix would be to adjust the condition:

if potential_code >= self.code_buffer.max_code() - Code::from(self.is_tiff) {

I'll measure if that leads to too much of a performance loss due to executing less of the simple code reconstruction.

Originally posted by @HeroicKatora in https://github.com/image-rs/lzw/issues/30#issuecomment-1103073574