rust-bakery / nom

Rust parser combinator framework
MIT License
9.18k stars 792 forks source link

unsigned integer underflow #1696

Open mxyns opened 9 months ago

mxyns commented 9 months ago

Hello,

While using nom I encountered an issue with this impl of Offset for [u8] https://github.com/rust-bakery/nom/blob/8c68e22d9f2d0aadaa1770d980d59ea562dfb6a7/src/traits.rs#L599-L606

The return value underflows when the first span is after the second in memory. This means the user always has to think about whether to use first.offset(second) or second.offset(first):

fn main() {

    let buf = [0,1,2,3,4,5,6,7,8,9];
    let mut first: LocatedSpan<&[u8]>= LocatedSpan::from(&buf[..]);
    let mut second = some_parser::read(first);

    let offset = first.offset(&second);
    println!("offset is negative: {}", offset);

    let offset = second.offset(&first);
    println!("offset is positive: {}", offset);
}

To me, an offset means it may be negative but maybe changing the return type of the offset method to isize may be problematic elsewhere, I don't know a lot about the internals of nom.

Maybe just doing an absolute value of the result could be enough? or let Offset have a signed version of the offset method?

danielocfb commented 2 months ago

Seeing the same issue. Also, I am not sure if "user error" is the right connotation here. The trait seems pretty broken if you ask me, as it appears to make the assumption that self and second are related, but that is not enforced or even documented anywhere to the degree I can tell. Okay, perhaps one can guess that relationship by what it does. And yet, this functionality is used in higher level APIs where this contract seems lost entirely.

E.g., this program panics to due over/underflow:

use nom::error::ErrorKind::Tag;
use nom::error::VerboseErrorKind::Nom;
use nom::error::convert_error;
use nom::error::VerboseError;

fn main() {
        let input = [31, 139, 8, 8, 85, 135, 48, 102, 2, 255, 108, 105, 98];
        let err = VerboseError {
            errors: vec![(String::from_utf8_lossy(&input), Nom(Tag))],
        };
        let _x = convert_error(String::from_utf8_lossy(&input), err);
}

the problem isn't really obvious at all. And in a general setting, neither may the fix be (on the user's end).

Geal commented 2 months ago

@mxyns @danielocfb as you saw, Offset assumes that the argument is part of the same slice and that should be documented. I looked at the change in blazesym is why the lossy conversion step would be needed in the first place? Because convert_error was not available for &[u8] inputs?

danielocfb commented 2 months ago

I looked at the change in blazesym is why the lossy conversion step would be needed in the first place? Because convert_error was not available for &[u8] inputs?

Yes, that is the reason. The function assumes something that derefs into str. So I we need some kind of conversion first.

Btw., I accidentally commented what is not the most fitting issue, I think https://github.com/rust-bakery/nom/issues/1619 actually covers this very problem already.