ipld / rust-ipld-core

This crate provides core types for interoperating with IPLD.
Apache License 2.0
5 stars 4 forks source link

serde deserializer fails deserialize to untagged enums which has integer(i128) values #19

Open sugyan opened 1 month ago

sugyan commented 1 month ago

Similar to the issue previously reported on https://github.com/ipld/serde_ipld_dagcbor/pull/21, when trying to deserialize to an Internally tagged or Untagged enum, it seems that the serde internally calls deserialize_any, in which case i128 is not supported.

The following test fails.

#[test]
fn ipld_deserializer_integer_untagged() {
    #[derive(Clone, Debug, Deserialize, PartialEq)]
    #[serde(untagged)]
    enum MyEnum {
        Foo { value: bool },
        Bar { value: i32 },
    }

    // foo: OK
    {
        let ipld = Ipld::Map(BTreeMap::from([("value".into(), Ipld::Bool(true))]));
        match MyEnum::deserialize(ipld) {
            Ok(deserialized) => {
                assert_eq!(deserialized, MyEnum::Foo { value: true });
            }
            Err(e) => {
                panic!("{e:?}");
            }
        }
    }
    // bar: FAIL
    {
        let ipld = Ipld::Map(BTreeMap::from([("value".into(), Ipld::Integer(42))]));
        match MyEnum::deserialize(ipld) {
            Ok(deserialized) => {
                assert_eq!(deserialized, MyEnum::Bar { value: 42 });
            }
            Err(e) => {
                panic!("{e:?}");
            }
        }
    }
}
SerdeError("invalid type: integer `42` as i128, expected any value")

I think this is a serde's unresolved issue, so it may not be the responsibility of this library.

However, we can work around this problem by changing Ipld::Integer to have i64 instead of i128. my experimental branch: https://github.com/sugyan/rust-ipld-core/commit/539530d6d9f20e9e34143d9a9d8f92375bbb0bf6

Since the IPLD specification requires minimum support for values up to 2^53, the change to i64 is destructive but still meets the specification. https://ipld.io/design/tricky-choices/numeric-domain/#integers

Is it possible to change it so that, for example, "if a certain feature flag is specified, the Ipld::Integer of i64 is used"?

My background is that I am developing a library atrium, and I would like to provide a function to hold unknown values received from the server as an Ipld and deserialize them to an arbitrary type on the user side. However, I am having trouble converting from Ipld to other types due to this problem...

vmx commented 1 month ago

For me using i128 for an IPLD Integer was because we can do that easily. But it sucks that it's breaking with Serde features people use. I also expect IPLD being used mostly through Serde.

Adding a feature is certainly possible, though I generally prefer taking the time to think about it a bit, whether there's a way so that we don't end up with an infinite list of knobs one could tweak (and end up in maintenance/support hell).

Hence I wonder if it makes sense to change the Integer type by default to i64. This is a big breaking change in theory, but I don't think the impact is that big practically. Many programming language have as a maximum built-in integer type of 64-bit, so IPLD libraries in those languages likely use that type.

Another way to look at this is: this is another case where Serde untagged enums are a problem. So should we just add a new feature as proposed here and document it properly that if you used untagged enums, that you should enable that features?

@Stebalien Do you have any thoughts on this?

sugyan commented 1 month ago

Thanks for your thoughts on this!

As a library user (if we don't consider compatibility, etc.), would the ideal form be to use i64 by default, and then be able to change to use num-bigint by the feature flag if we want to handle larger values? (Even when dealing with bigint, the full serde functionality will probably not be available, so in this case serde support should either be disabled or limited, and it should be clearly stated in the documentation...)

Stebalien commented 1 month ago

So, the issue with just i64 is that u64::MAX is a nice and useful sentinel value (and isn't representable in i64). On the other hand, the Ipld thing is mostly for convenience when dealing with data-model types...

I wonder if we should have an IpldGeneric<P: IpldPrimitives> type aliased to Ipld<SaneDefaults>. Something like:

/// Doesn't have to be a trait, could just be type parameters but... that's a lot of type parameters.
pub trait Primitives {
    type Integer;
    type String;
    type Map;
    // ...
}
pub enum IpldGeneric<P: Primitives> {
    Integer(P::Integer),
    String(P::String),
    // ...
}

type Ipld = IpldGeneric<DefaultPrimitives>;

That should give users the tools to do whatever they need. But it's also adding complexity.

(anyways, please ignore the 2^53 thing)

vmx commented 1 month ago

@Stebalien That's an interesting idea. Having used Rust for a while, I wonder if that adds complexity at unintended places.

I wonder if that makes interoperability between systems/libraries a problem. But maybe it's also a win as this way it breaks in kind of visible ways and not just subtle due to random feature flags.

@sugyan would you be open to give that idea a chance and try it out. I'd imagine that atrium is a large enough system to get an idea whether this ends up in unusable complexities or is quite easy to use.

sugyan commented 1 month ago

@vmx I personally don't care what implementation changes are made, as long as they solve the enum deserialize failures that are currently troubling atrium anyway.

I think it would be a great implementation to allow users to choose the type they want to use more flexibly. However, I can't imagine that there would be any case where the user would want to change the primitive type other than to make the integer i64 or larger.

vmx commented 1 month ago

@sugyan I've created a PR with @Stebalien's suggestion at https://github.com/ipld/rust-ipld-core/pull/21. It would be cool if you could try it out and also let me know if you ran into any weird issues. My hope would be that things would be smooth.

You would add something like this to your code:

#[derive(Clone)]
pub struct PrimitivesI64;

impl Primitives for PrimitivesI64 {
    type Bool = bool;
    type Integer = i64;
    type Float = f64;
    type String = String;
    type Bytes = Vec<u8>;
    type Link = Cid;
}

type Ipld = IpldGeneric<PrimitivesI64>;
sugyan commented 1 month ago

Unfortunately, in my case this does not seem to work as expected.

https://github.com/sugyan/atrium/actions/runs/10372924349/job/28716796955?pr=212#step:3:187

Because the only implementation of Serializer or Deserializer in crate::serde::ser, crate::serde::de is for Ipld (it's just IpldGeneric<DefaultPrimitives>) and if we define new type with type Ipld = IpldGeneric<PrimitivesI64>, it cannot be serialized/deserialized them because there is no impl of Serializer or Deserializer for them.

vmx commented 1 week ago

@sugyan sorry for taking so long and thanks for giving #21 a try. Although it's kind of a nice idea to solve it with traits, the added complexity gets too much out of control for my taste. Thinking about it again, I think a feature flag might be the simplest and easiest to follow solution.

@sugyan if you could try out https://github.com/ipld/rust-ipld-core/pull/22 it would be great. It should be as simple as enabling the integer-max-i64 feature.

sugyan commented 1 week ago

@vmx Thank you for continuing to work on solving the problem! It seems to be very difficult to solve the problem with traits... The solution in #22 is relatively simple, I've confirmed that it works fine in my library! https://github.com/sugyan/atrium/pull/212