While experimenting with blk-archive I tried to pack/dev/zram0. In my case the block device is mostly zero's, but here is a few bytes that have non-zero data. The issue is the chunker fails to find any consumes and ultimately treats the entire 8G file as one chunk. This chunk gets tested for repeated pattern which fails, so the entire 8G chunk is attempted to be added as a slab entry which panics on:
thread 'main' panicked at src/hash_index.rs:64:13:
assertion failed: e.len <= u32::MAX as usize
I tried to add a simple change which simply treats the entire buffer as a chunk if it fails to find any.
diff --git a/src/content_sensitive_splitter.rs b/src/content_sensitive_splitter.rs
index c07d448..b9ee687 100644
--- a/src/content_sensitive_splitter.rs
+++ b/src/content_sensitive_splitter.rs
@@ -169,6 +169,14 @@ impl ContentSensitiveSplitter {
}
}
+ // If we haven't found any segments, lets consume this contiguous chunk. We likely
+ // have a data stream that is a repeating pattern, e.g. all zeros.
+ // Note: The max length this can be is u32::MAX as we will assert in IndexBuilder if length
+ // exceeds this which is what will happen without this check.
+ if consumes.is_empty() {
+ consumes.push(data.len());
+ }
+
consumes
}
}
This gets around this specific problem, but unfortunately introduces other errors during unpack when dealing with a file that is all 1 pattern. I haven't dug into why this simple change introduces an error in the stream creation which causes the unpack to panic. I do see that even when we force smaller chunks and they are found to be repeating pattern that the many entries get combined into 1 map entry anyway. The problem seems to stem from additional stream instructions which refer to a slab that doesn't exist.
While experimenting with
blk-archive
I tried topack
/dev/zram0
. In my case the block device is mostly zero's, but here is a few bytes that have non-zero data. The issue is the chunker fails to find anyconsumes
and ultimately treats the entire 8G file as onechunk
. This chunk gets tested for repeated pattern which fails, so the entire 8G chunk is attempted to be added as a slab entry which panics on:I tried to add a simple change which simply treats the entire buffer as a
chunk
if it fails to find any.This gets around this specific problem, but unfortunately introduces other errors during unpack when dealing with a file that is all 1 pattern. I haven't dug into why this simple change introduces an error in the stream creation which causes the unpack to panic. I do see that even when we force smaller chunks and they are found to be repeating pattern that the many entries get combined into 1 map entry anyway. The problem seems to stem from additional stream instructions which refer to a slab that doesn't exist.