Add lax_deserialize feature

jplatte commented 4 years ago

CC ruma/ruma#99 @MTRNord @DevinR528 @iinuwa @poljar

Left to do:

docs
tests

iinuwa commented 4 years ago

Nice. Just a thought, is it worth being able to configure how lax the deserialization is? Specifically, 5.0 -> 5 doesn't lose any data, but 5.1 -> 5 does. Could the former be considered "non-lax"?

jplatte commented 4 years ago

I did think about that. It would be possible, but then I should probably hand-roll deserialization instead, so 10000.00000000000000001 is rejected, since that would be accepted with f64 deserialization + checking whether the fractional part is 0. It may be the better path forward either way, since I'm not entirely sure MIN_SAFE_INT - 1 and. MAX_SAFE_INT + 1 are handled properly right now (they probably are).

poljar commented 4 years ago

Thanks for this, I'm already fine with the initial implementation under the assumption that MIN/MAX_SAFE are handled correctly.

Checking the fractional part sounds nice as well so no objections here.

jplatte commented 4 years ago

After thinking more about this, since js_int is not tied to deserialization from JSON, hand-rolling parsing is impossible or at the very least doesn't make sense. I don't see there being a good solution for "only-if-fractional-equals-zero".

DevinR528 commented 4 years ago

This is what I was thinking at first (this is more just in case, I am fine with the current implementation).

This has a lot of allocation for such a small task

impl<'de> Deserialize<'de> for UInt {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        let val = f64::deserialize(deserializer)?.to_string();
        match val.split('.').collect::<Vec<_>>().as_slice() {
            &[int] => {
                let num: u64 = int.parse().map_err(D::Error::custom)?;
                Self::new(num).ok_or_else(|| {
                    D::Error::invalid_value(
                        Unexpected::Unsigned(num),
                        &"an integer between 0 and 2^53 - 1",
                    )
                })
            }
            &[int, dec] => {
                if !dec.replace("0", "").is_empty() {
                    return Err(D::Error::custom("expected an integer found a float"));
                }

                let num: u64 = int.parse().map_err(D::Error::custom)?;
                Self::new(num).ok_or_else(|| {
                    D::Error::invalid_value(
                        Unexpected::Unsigned(num),
                        &"an integer between 0 and 2^53 - 1",
                    )
                })
            }
            _ => Err(D::Error::custom("found invalid number")),
        }
    }
}

poljar commented 4 years ago

This is what I was thinking at first (this is more just in case, I am fine with the current implementation).

This has a lot of allocation for such a small task

impl<'de> Deserialize<'de> for UInt {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        let val = f64::deserialize(deserializer)?.to_string();
        match val.split('.').collect::<Vec<_>>().as_slice() {
            &[int] => {
                let num: u64 = int.parse().map_err(D::Error::custom)?;
                Self::new(num).ok_or_else(|| {
                    D::Error::invalid_value(
                        Unexpected::Unsigned(num),
                        &"an integer between 0 and 2^53 - 1",
                    )
                })
            }
            &[int, dec] => {
                if !dec.replace("0", "").is_empty() {
                    return Err(D::Error::custom("expected an integer found a float"));
                }

                let num: u64 = int.parse().map_err(D::Error::custom)?;
                Self::new(num).ok_or_else(|| {
                    D::Error::invalid_value(
                        Unexpected::Unsigned(num),
                        &"an integer between 0 and 2^53 - 1",
                    )
                })
            }
            _ => Err(D::Error::custom("found invalid number")),
        }
    }
}

Out of interest, do you need to convert the float to a string? Can't you check if the float is a number using is_finite() and then get to the fractional part using fract()?

Converting a string to a f64 and then back to a string seems a bit much, am I missing something?

jplatte commented 4 years ago

I've added tests and they are passing, at least when increasing serde_jsons float parsing accuracy 😅

CC serde-rs/json#692

jplatte commented 4 years ago

I thought all tests passed locally. Seems unrelated to the new logic, will investigate later.

DevinR528 commented 4 years ago

Out of interest, do you need to convert the float to a string? Can't you check if the float is a number using is_finite() and then get to the fractional part using fract()?

Yeah, that would have been much better. I didn't look closely enough at the f64 methods thanks for the tip @poljar that'll come in handy.

jplatte commented 4 years ago

Can't you check if the float is a number using is_finite() and then get to the fractional part using fract()?

I guess I didn't make this clear enough before: the reason I don't like this (which equally applies to @DevinR528's solution though) is that floats with a non-zero fractional part that can't be represented by f64 would still be accepted, so the check would be imperfect. At that point I'd rather throw away the fractional part altogether.

Examples:

10000.00000000000000001f64 == 10000.0f64 9007199254740991.1f64 == 9007199254740991.0f64

poljar commented 4 years ago

Can't you check if the float is a number using is_finite() and then get to the fractional part using fract()?

I guess I didn't make this clear enough before: the reason I don't like this (which equally applies to @DevinR528's solution though) is that floats with a non-zero fractional part that can't be represented by f64 would still be accepted, so the check would be imperfect. At that point I'd rather throw away the fractional part altogether.

Oh the question wasn't really pointed at you. I didn't mean to ask why you're not checking it that way, rather it was pointed at @DevinR528 since the string dance confused me and seemed pointless.

Sorry I thought that quoting the code snippet would make it clear.

jplatte commented 4 years ago

And my explanation was mostly aimed at devin too 😉 😆

ruma / js_int

Add lax_deserialize feature #15