zeek / spicy

C++ parser generator for dissecting protocols & files.
https://docs.zeek.org/projects/spicy
Other
248 stars 37 forks source link

Real value parsing from ASCII representation. #1750

Closed sethhall closed 4 months ago

sethhall commented 5 months ago

In the tests (https://github.com/zeek/spicy/blob/main/tests/spicy/types/real/parse.spicy) it appears that there is only support for parsing reals that are stored as IEEE values. It would be nice to be able to parse string representations of floating point numbers into real values.

Like if the actual ascii bytes "-1.34432" were parsed from a byte stream. It doesn't seem like this is currently possible with built in mechanisms.

bbannier commented 5 months ago

Thanks for the issue, this seems indeed currently missing. Another good place for this would probably be as a method on bytes where we already provide functions like to_int, to_uint or to_time. Such a function would be locale-dependent and not loss-free since not all decimal numbers can be represented exactly as real, so it would be more "dangerous" than the exiting functions.

For the time being one could implement conversion bytes -> real in a helper function, e.g.,

function to_real(str: bytes): real {
    # Assume locale with decimal separator `.`.
    local xs = str.split(b".");

    # A valid fractional number has the form `XXX.YYYYY` with exactly one separator.
    assert |xs| == 2;

    local a = cast<real>(xs[0].to_int());

    local b = xs[1];
    local sgn = a > 0 ? 1.0 : -1.0;

    return a + sgn * cast<real>(b.to_uint()) / (10**cast<real>(|b|));
}

type X = unit {
    x: bytes &until=b"\n" &convert=to_real($$);
};
sethhall commented 5 months ago

Oh nice! I couldn't figure out how to write that function. Thanks!

sethhall commented 5 months ago

Thanks again for that function. Here's a mostly complete json parser I needed it for... https://gist.github.com/sethhall/386c941a0f778d8b79be03c7fbfd47d0