oxidecomputer / typify

compiler from JSON Schema into idiomatic Rust types
Apache License 2.0
393 stars 57 forks source link

Enums of named integer constants/enums with "other" variant #574

Open miwig opened 4 months ago

miwig commented 4 months ago

This schema:

{
    "title": "My enum",
    "oneOf": [
        {
            "const": 1,
            "description": "Foo",
            "type": "integer"
        },
        {
            "const": 5,
            "description": "Bar",
            "type": "integer"
        }
    ]
}

generates this non-functional output:

#[derive(Clone, Debug, Deserialize, Serialize)]
#[serde(untagged)]
pub enum MyEnum {
    Variant0(i64),
    Variant1(i64),
}

Desured output would be:

pub enum MyEnum {
    Foo = 1,
    Bar = 5,
}

I think the serde_repr crate could be used in this case.

Similarly, this:

{
    "title": "My enum",
    "anyOf": [
        {
            "const": 1,
            "description": "Foo",
            "type": "integer"
        },
        {
            "const": 5,
            "description": "Bar",
            "type": "integer"
        },
        {
            "type": "integer"
        }
    ]
}

generates

#[derive(Clone, Debug, Deserialize, Serialize)]
pub struct MyEnum {
    #[serde(flatten, default, skip_serializing_if = "Option::is_none")]
    pub subtype_0: Option<i64>,
    #[serde(flatten, default, skip_serializing_if = "Option::is_none")]
    pub subtype_1: Option<i64>,
    #[serde(flatten, default, skip_serializing_if = "Option::is_none")]
    pub subtype_2: Option<i64>,
}

Desired output would be something like

pub enum MyEnum {
    Foo = 1,
    Bar = 5,
    Other(i64)
}

I think for integer variants this would require custom (de)serialization code. However, this Other feature would also be useful for enums with string variants, where I believe it might be accomplished with #[serde(other)] Examples in the wild:

ahl commented 4 months ago

First, clearly typify is not doing a good job of handling oneOfs with constant values. This type is awful:

pub enum MyEnum {
    Variant0(i64),
    Variant1(i64),
}

I would note that if you use the title field rather than the description field you'll get:

pub enum MyEnum {
    Foo(i64),
    Bar(i64),
}

... but that's only slightly better looking and no more faithful.

Here's what I'd want to see (more or less).

pub enum MyEnum {
    Foo,
    Bar,
}

impl<'de> serde::Deserialize<'de> for MyEnum {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: serde::Deserializer<'de>,
    {
        match u64::deserialize(deserializer)? {
            1 => Ok(Self::Foo),
            5 => Ok(Self::Bar),
            _ => Err(D::Error::invalid_value(...)),
        }
    }
}

I don't think I'd use the discriminant value since 1. serde itself doesn't consider that (and I'm loath to pull in another serde-like dependency) and 2. we need something similar for other, constant, non-integral values.


With regard to that second type (the anyOf) can you tell me more about how folks would use the Rust type? Or is there a more significant example you can share? My concern is that one could either say MyEnum::Foo or MyEnum::Other(1) and those would be different values that were equivalent when serialized.

I get that's the point of anyOf but wouldn't we then be within our right so just always deserialize into the Other variant?

miwig commented 4 months ago

I don't think I'd use the discriminant value since 1. serde itself doesn't consider that (and I'm loath to pull in another serde-like dependency) and 2. we need something similar for other, constant, non-integral values.

Sure, that code looks good. Setting a discriminant would make serialization code shorter but doesn't help that much for deserialization since there's no from_discriminant. It might be useful to have it available though if the schema goes to the trouble of defining it. But yes, non-integral values need a different solution.

With regard to that second type (the anyOf) can you tell me more about how folks would use the Rust type? Or is there a more significant example you can share? My concern is that one could either say MyEnum::Foo or MyEnum::Other(1) and those would be different values that were equivalent when serialized.

I get that's the point of anyOf but wouldn't we then be within our right so just always deserialize into the Other variant?

Sure, looking purely at the schema, with anyOf all the other variants become obsolete once you have {"type":"integer"} in there. From that standpoint, just generating the field as i64 would be justifiable, and certainly more correct than what typify is doing now.
Semantically, I would understand Other to mean "none of the above", so always deserializing into that variant would be a semantic error. I suppose a more correct way to express this in the schema would be to add a not subschema excluding the known variants to the Other variant?

In any case, what I suspect the authors of that schema meant is: "These are the known values as of now, but there may be user-defined values or future extensions that should be accepted for forward compatibility".

ahl commented 4 months ago

Semantically, I would understand Other to mean "none of the above", so always deserializing into that variant would be a semantic error. I suppose a more correct way to express this in the schema would be to add a not subschema excluding the known variants to the Other variant?

In any case, what I suspect the authors of that schema meant is: "These are the known values as of now, but there may be user-defined values or future extensions that should be accepted for forward compatibility".

Agreed! I'm trying to figure out how we would infer that to be the case or come up with a heuristic that is generally applicable (i.e. rather than one that might be wrong). In general our current anyOf handling is terrible (and needs to be expanded into a oneOf powerset). Perhaps it would suffice to say "if one case overlaps with all others and no other overlap then we consider this to be discreet values plus a none-of-the-above". I think I'd still want to do something like this:

enum MyEnum {
    Foo,
    Bar,
    Other(MyEnumOther),
}
struct MyEnumOther(i64);

impl From<i64> for MyEnum {
    fn from(value: i64) -> Self {
        match value {
            1 => Self::Foo,
            5 => Self::Bar,
            n => Self::Other(n),
        }
    }
}

impl From<MyEnum> for i64 { .. }
impl From<MyEnumOther> for i64 {
    fn from(value: MyEnumOther) -> Self {
        value.0
    }
}

// .. and maybe this as well:
impl TryFrom<i64> for MyEnumOther {
    fn try_from(value: i64) -> Result<Self, ..> {
         match (value) {
            /* known values */ => Err(..),
            n => Ok(Self(n)),
        }
    }
}

This might be overly fussy but it would mean that a particular value had a single representation in the generated code.

miwig commented 4 months ago

That seems reasonable to me. For what it's worth, I don't know if this is a common pattern in the json schema world. I would be content to just write these types myself for now, but there it is somewhat annoying that the replacements in TypeSpaceSettings only seem to apply to types generated from references.