rust-bakery / nom-bufreader

BufReader adapter for nom parsers
MIT License
9 stars 5 forks source link

Near empty buffer not refilling #2

Closed rob-p closed 3 years ago

rob-p commented 3 years ago

Hi @Geal,

First of all, thanks for writing this awesome wrapper crate. I'm trying to write the nom parser for my file format, and I have part of the file that consists of "string" entries. These are records of the form (u16, [u8]) where the first element gives the string length and the second the actual string. I wrote a parser for this and it seems to work brilliantly ... until the buffer is almost empty. I end up in a situation where the natural (initial) buffer size ends in the middle of a string.

The expected behavior was that the parse method fails with Incomplete, the buffer is filled with more data, and then the next call to parse succeeds. However, this isn't what happens. Instead, I get stuck in an infinite loop where there is always 4 bytes left in the buffer. It seems the BufReader is never refilled. I think this line is the culprit.

Here's where things get hairy. From the documentation of BufReader::fill_buf, I get the impression that it might only fill the buffer if it is completely empty. Thus, if there is a call to fill_buff while there are contents left in the internal buffer (even if only 4 bytes) then the buffer won't get refilled and we get stuck in this infinite loop.

Anyway, please let me know if this makes sense or if it seems like I'm overlooking something simple.

Thanks! Rob

P.S. Relevant StackOverflow.

Geal commented 3 years ago

Right, so this is the other reason why BufReader was not really usable. IIRC I had a fork that would call fill_buf even if there was still some data in the buffer. I think that's what should be done here, hold another version of BufReader in this crate to solve this once and for all.

There's a further issue in detecting when the underlying stream is closed (normally or error): we might still have some data to parse in the buffer once it is closed, so it should not return immediately

rob-p commented 3 years ago

Sounds like a good solution! Is the fork public?

Geal commented 3 years ago

it's there but quite old: https://github.com/Geal/nom/blob/d66466901ba6917beb34cdc3c54cbee38c7ae901/src/accumulator.rs It would be better to start over

Geal commented 3 years ago

ok, can you try with the implementation in the bufreader module? 1541037

rob-p commented 3 years ago

Hi @Geal,

I've only written the parser for the header of the file, but I can confirm that in that use case, this change fixes the issue I saw before. I'm now able to completely (and accurately) parse the whole header!

Thanks, Rob

Geal commented 3 years ago

good to know! I've pushed a 0.1.1 version with that new bufreader, and the async version as well