gyscos / zstd-rs

A rust binding for the zstd compression library.
MIT License
502 stars 105 forks source link

Endless loop on incomplete frame error #182

Open mostlymaxi opened 1 year ago

mostlymaxi commented 1 year ago

Stream decoder will loop forever on an incomplete frame error.

I run into corrupted zst files quite often in life that contain a premature end read error: test.zst : 1036 MB... test.zst : Read error (39) : premature end

This tends to occur when a zst file is truncated.

I am using the stream decoder to iterate over lines of this corrupted file and the error that I get is this: Custom { kind: UnexpectedEof, error: "incomplete frame", }

Strangely, this will then try to keep pulling the same frame and loop this error. Should the stream close on error?

Code:

let file = File::open("test").unwrap();
let rdr = Decoder::new(file).unwrap();
let rdr = BufReader::new(rdr);

for line in rdr.lines() {
  //
}
gyscos commented 1 year ago

Ah I finally understood what happens: when it reaches the unexpected EOF, the Decoder returns an error. If you call it again, it will keep returning this same error. The Lines iterator will simply forward these errors. Calling .next() will return an error every time. As you can see, the only way for Lines to end iteration is for the reader to return Ok(0):

https://github.com/rust-lang/rust/blob/37d7de337903a558dbeb1e82c844fe915ab8ff25/library/std/src/io/mod.rs#L2822

If you unwrap() or return any error coming from rdr.lines(), it will stop iteration as expected.

The issue is that the Decoder doesn't "fuse" (and starts pretending to be an empty stream) when if finds an error. I wonder how other Reader implementations react - does a socket pretend to be valid and empty after returning an error if it gets disconnected? :shrug: