These codes are inserted into the table, but can't be used or referenced in the code text. They are just 'waste'.
I've thrown together a patch for to put in debug_asserts for the actual invariants that the code is working under:
Patch file
```diff
From ac575ce26fb081883092536e0fcbf00c2af59cc2 Mon Sep 17 00:00:00 2001
From: Andreas Molzer
Date: Tue, 19 Apr 2022 21:29:13 +0200
Subject: [PATCH] Add debug assertions on internal invariants
---
src/decode.rs | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/src/decode.rs b/src/decode.rs
index 283e31f..49f3bfd 100644
--- a/src/decode.rs
+++ b/src/decode.rs
@@ -711,7 +711,7 @@ impl Stateful for DecodeState {
Some(tup) => {
status = Ok(LzwStatus::Ok);
code_link = Some(tup)
- },
+ }
};
// Track an empty `burst` (see below) means we made no progress.
@@ -827,6 +827,7 @@ impl Stateful for DecodeState {
// the case of requiring an allocation (which can't occur in practice).
let new_link = self.table.derive(&link, cha, code);
self.next_code += 1;
+ debug_assert!(self.next_code as usize <= MAX_ENTRIES);
code = burst;
link = new_link;
}
@@ -918,6 +919,8 @@ impl Stateful for DecodeState {
}
self.next_code += 1;
+ debug_assert!(self.next_code as usize <= MAX_ENTRIES);
+
new_link = link;
} else {
// It's actually quite likely that the next code will be a reset but just in case.
@@ -1203,6 +1206,13 @@ impl Table {
}
fn derive(&mut self, from: &Link, byte: u8, prev: Code) -> Link {
+ debug_assert!(
+ self.inner.len() < MAX_ENTRIES,
+ "Invalid code would be created {:?} {} {:?}",
+ from.prev,
+ byte,
+ prev
+ );
let link = from.derive(byte, prev);
let depth = self.depths[usize::from(prev)] + 1;
self.inner.push(link.clone());
--
2.35.1
```
The trace of running decoding with those suggest that the comparison itself relies on an incorrect assumption. Since it uses == it relies on self.next_code <= self.code_buffer.max_code() but that doesn't hold. When we reach 12-bits then the code buffer does not get larger and max_code() remains at 4095. At the same time next_code will advance to 4096, and never beyond in the sequential code path, a code that will never be created and thus works correctly with the rest of the logic.
But when that is the exact moment that we enter a burst, as is the case with the provided file, then it will advance next_code beyond that and not notice that the maximum code has been reached. An easy fix would be to adjust the condition:
if potential_code >= self.code_buffer.max_code() - Code::from(self.is_tiff) {
I'll measure if that leads to too much of a performance loss due to executing less of the simple code reconstruction.
These codes are inserted into the table, but can't be used or referenced in the code text. They are just 'waste'.
I've thrown together a patch for to put in
debug_asserts
for the actual invariants that the code is working under:Patch file
```diff From ac575ce26fb081883092536e0fcbf00c2af59cc2 Mon Sep 17 00:00:00 2001 From: Andreas MolzerThe trace of running decoding with those suggest that the comparison itself relies on an incorrect assumption. Since it uses
==
it relies onself.next_code <= self.code_buffer.max_code()
but that doesn't hold. When we reach 12-bits then the code buffer does not get larger andmax_code()
remains at 4095. At the same timenext_code
will advance to 4096, and never beyond in the sequential code path, a code that will never be created and thus works correctly with the rest of the logic.But when that is the exact moment that we enter a burst, as is the case with the provided file, then it will advance
next_code
beyond that and not notice that the maximum code has been reached. An easy fix would be to adjust the condition:I'll measure if that leads to too much of a performance loss due to executing less of the simple code reconstruction.
Originally posted by @HeroicKatora in https://github.com/image-rs/lzw/issues/30#issuecomment-1103073574