rust-bakery / nom

Rust parser combinator framework
MIT License
9.34k stars 801 forks source link

Parsing of long floating point numbers causes panic #1421

Open MatejKastak opened 2 years ago

MatejKastak commented 2 years ago

Hello, first of all thank you, nom is a great tool!

I have tried to migrate to a new nom version (7.0.0) and encountered some unexpected behavior.

After a little digging, I think this is caused by switching to minimal-lexical.

the lexical-core crate used for float parsing has now been replaced with minimal-lexical: the new crate is faster to compile, faster to parse, and has no dependencies

Somewhat related to #1384

Prerequisites

Test case

mod tests {
    use nom::number::complete::float;
    use nom::IResult;

    fn parser(s: &str) -> IResult<&str, f32> {
        float(s)
    }

    #[test]
    fn float_panic() {
        // 7.0.0 -> panic
        // 6.2.1 -> pass
        assert!(std::matches!(parser("0.00000000000000000087"), Ok(_)));
        assert!(std::matches!(
            parser("0.00000000000000000000000000000000000000000000000000000000000000087"),
            Ok(_)
        ));

    }
}               

nom v7.0.0

---- tests::float_panic stdout ----
thread 'tests::float_panic' panicked at 'attempt to shift left with overflow', /home/anon/.cargo/registry/src/github.com-1ecc6299db9ec823/minimal-lexical-0.1.4/src/lemire.rs:155:5

nom v6.2.1

running 1 test
test tests::float_panic ... ok
Geal commented 2 years ago

thanks for the report. Could you test with nom directly from master?

MatejKastak commented 2 years ago

Sure.

~/dev/sandbox/nom-float-panic-new master* 21s
λ cargo test
    Updating crates.io index
   Compiling version_check v0.9.3
   Compiling memchr v2.3.4
   Compiling minimal-lexical v0.2.1
   Compiling nom v7.0.0 (/home/anon/dev/sandbox/nom)
   Compiling nom-float-panic-new v0.1.0 (/home/anon/dev/sandbox/nom-float-panic-new)
    Finished test [unoptimized + debuginfo] target(s) in 9.44s
     Running unittests (target/debug/deps/nom_float_panic_new-b07783cdd65cd87f)

running 1 test
test tests::float_panic ... FAILED

failures:

---- tests::float_panic stdout ----
thread 'tests::float_panic' panicked at 'attempt to shift left with overflow', /home/anon/.cargo/registry/src/github.com-1ecc6299db9ec823/minimal-lexical-0.2.1/src/lemire.rs:155:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I did a git clone, current master HEAD is at dfa55914dc9673e3092974e00f8967220fbfaf78, and then specified nom dependency as path

[dependencies]
# nom = "6"
nom = { path = "../nom" }
nbigaouette commented 2 years ago

I got the same panic using b"0.00000000000000000018" as input.

mullr commented 2 years ago

Here's the workaround I'm using for now. It's a bit of a cudgel, but afaict it gets the old behavior back.

/// This is derived from nom::number::complete::float, which is unfortunately broken in nom 7.0.0.
/// - https://github.com/Geal/nom/issues/1421
/// - https://github.com/Geal/nom/issues/1384
pub fn float<T, E: ParseError<T>>(input: T) -> IResult<T, f32, E>
where
    T: nom::Slice<std::ops::RangeFrom<usize>>
        + nom::Slice<std::ops::RangeTo<usize>>
        + nom::Slice<std::ops::Range<usize>>,
    T: Clone + nom::Offset,
    T: nom::InputIter + nom::InputLength + nom::InputTake,
    <T as nom::InputIter>::Item: nom::AsChar + Copy,
    <T as nom::InputIter>::IterElem: Clone,
    T: nom::InputTakeAtPosition,
    <T as nom::InputTakeAtPosition>::Item: nom::AsChar,
    T: nom::AsBytes,
    T: for<'a> nom::Compare<&'a [u8]>,
    T: nom::ParseTo<f32>,
    T: nom::InputTake + nom::Compare<&'static str>,
{
    let (i, s) = recognize_float(input)?;
    match s.parse_to() {
        Some(f) => (Ok((i, f))),
        None => Err(nom::Err::Error(E::from_error_kind(
            i,
            nom::error::ErrorKind::Float,
        ))),
    }
}

pub fn recognize_float<T, E: ParseError<T>>(input: T) -> IResult<T, T, E>
where
    T: nom::Slice<std::ops::RangeFrom<usize>> + nom::Slice<std::ops::RangeTo<usize>>,
    T: Clone + nom::Offset,
    T: nom::InputIter,
    <T as nom::InputIter>::Item: nom::AsChar,
    T: nom::InputTakeAtPosition,
    <T as nom::InputTakeAtPosition>::Item: nom::AsChar,
    T: nom::InputTake + nom::Compare<&'static str>,
{
    alt((
        nom::number::complete::recognize_float,
        nom::combinator::recognize(nom::bytes::complete::tag("NaN")),
    ))(input)
}
parasyte commented 2 years ago

This should have been closed by #1455

Geal commented 2 years ago

Yes. Leaving it open for now, I'd like to get a proper fix in minimal-lexical, because we take a perf hit with that workaround