Closed evanrichter closed 2 years ago
Looking good!
I think that the issue is these lines on the fuzzer:
for page in get_page_iterator(column_meta, &mut reader, None, Vec::new(), 16 * 1024)? {
let mut decompress_buffer = Vec::new();
if let Ok(page) = page {
let _page = decompress(page, &mut decompress_buffer);
}
}
I think we want the following:
for maybe_page in get_page_iterator(column_meta, &mut reader, None, Vec::new(), 16 * 1024)? {
let mut decompress_buffer = Vec::new();
let page = maybe_page?;
let _page = decompress(page, &mut decompress_buffer)?;
}
so that we stop the iteration on an error (in either getting the page or decompressing the page (thus the two ?
)
that was the issue, false alarm :sweat_smile:
my fuzzer found a timeout that seemed like the Reader was not making progress. Debugging the root cause of that is difficult, usually involving
perf
and flamegraph to see what is executing. So I updated the harness to detect when Read or Seek operations have not changed the cursor for 8 times in a row, and turned that into a panic with a nice backtrace:I'm not sure if this is a side-effect of fuzzing with mocked I/O (e.g.
std::io::Cursor
or myFuzzInput
) and if a realstd::fs::File
implementsread
andread_exact
in a way that allows parquet2 to terminate here.Initial tests with the following
read_exact
produce infiniteEOF!
printed to the screen:Test case attached (not really a txt file, use with cargo-fuzz on my fork): minimized-from-b7067ad7c9c1eccf07c13b9c4bc7e2632a81b52f.txt