rust-bakery / nom

Rust parser combinator framework
MIT License
9.45k stars 804 forks source link

Panic using 'consumed()': Problem with 'Offset' and empty slice? #1424

Open nbigaouette opened 3 years ago

nbigaouette commented 3 years ago

Hi all!

Since I adapted my parser to use consumed(), I get weird panics coming from nom.

I have trouble reproducing it with a MVE. Using a debugger on my test case though revealed the problem. Let me give a description of what happens...

When inside consumed(), the i is b"[]" (a byte slice containing an opening and closing brackets). After my parser get called, the match goes to the Ok branch. There, both remaining and result are empty slices (&[]). Then calling input.offset(&remaining) returns an out-of-range index!! Using input.slice(..index) on the following line will thus panic.

Here's a runable example that shows the bad offset calculation:

fn main() {
    let input = b"[]";
    println!("input.offset(&[]): {}", input.offset(&[]));
}

This prints the following:

input.offset(&[]): 289

If this offset is used to index the input, this will panic.

Stargateur commented 3 years ago

you can't do that, offset need to be used with the same origin, aka pointer, and the argument must be "same or after" self. I think we should update doc on this one.

It's even probably UB to do it.

fn main() {
    let input = b"[]";
    assert_eq!(0, input.offset(&input));
}

should work correctly. See test here

nbigaouette commented 3 years ago

Yes, looking further into this I realized my mistake. The parser I use with consumed() is a custom one that returned &[] in a specific case. Since this empty slice is not a subslice of the input the offset() call returns bogus indices.

When I got the panic I checked the doc on any possible panic and did not find anything. I agree that this information should probably be part of the offset() documentation. I would have found and fixed my issue :D

Stargateur commented 3 years ago

The parser I use with consumed() is a custom one that returned &[] in a specific case.

We should probably also advice to never do that and prefer, input[x..x], x being the index where your parser decide to "valid" the input, x can be 0..=input.len(), should work better.

I'm not good in english so I pass my turn on this PR. Need to update at least:

nbigaouette commented 3 years ago

That was the fix I made. Instead of returning &[] (which we should never do) I returned (something like) &input[input.len()..]. In my case I wanted the "offset" to be at the end of the input.

Geal commented 3 years ago

sure, I'll add some doc for that