lerouxrgd / rsgen-avro

Command line and library for generating Rust types from Avro schemas
MIT License
37 stars 29 forks source link

Deserializing from avro_rs to rsgen_avro union fails with "not an enum" #18

Closed codehearts closed 3 years ago

codehearts commented 3 years ago

I'm trying to deserialize an avro_rs::types::Value to a struct generated by rsgen_avro, but the deserialization fails on the Option<UnionIntLong> field with avro_rs saying "not an enum".

/// Auto-generated type for unnamed Avro union variants.
#[derive(Debug, PartialEq, Clone, serde::Deserialize, serde::Serialize)]
pub enum UnionIntLong {
    Int(i32),
    Long(i64),
}

#[derive(Debug, PartialEq, Clone, serde::Deserialize, serde::Serialize)]
#[serde(default)]
pub struct Event {
    pub a: Option<UnionIntLong>,
}

impl Default for Event {
    fn default() -> Event {
        Event { a: None }
    }
}

let avro_value = Value::Record(vec![
    ("a".to_string(), Value::Union(Box::new(Value::Int(123))))
]);

let my_event = from_value::<Event>(&avro_value)?; // Err(avro_rs::Error::DeserializeValue("not an enum"))

What's happening?

The issue seems to be avro_rs using a custom deserializer, which the serde-generated implementation for UnionIntLong calls deserialize_enum on. Since the Avro-encoded union doesn't contain an enum, the deserialization fails.

I've set up a repo with a full reproduction of the issue and 2 proposed solutions. Both solutions might need to be behind a feature flag, since they may be breaking changes.

An easy fix

In my case, the caveat is fine because I'm casting both variants to a single Rust type in my codebase anyway.

A painful fix

This deserializes Avro types to their correct variants, so a long would deserialize as UnionIntLong::Long. The downside is how much additional code and complexity this adds.


There might be a better way of handling this, and maybe avro_rs is the right crate to implement a fix for this in, but I thought I'd open an issue here and see what the thoughts are.

lerouxrgd commented 3 years ago

Hi, thank you for the detailed report and fix suggestions.

I like the easy fix with #[serde(untagged)] but there is the issue you observed due to ordering of untagged enums:

Serde will try to match the data against each variant in order and the first one that deserializes successfully is the one returned.

The painful fix seems indeed painful for things like records.

I'll try to think about what's best, but I feel it will be the painful one...

lerouxrgd commented 3 years ago

Another possibility is to handle it directly in avro-rs as follows:

avro-rs patch ```diff diff --git a/src/de.rs b/src/de.rs index 5894298..fa005ce 100644 --- a/src/de.rs +++ b/src/de.rs @@ -34,6 +34,10 @@ pub struct EnumUnitDeserializer<'a> { input: &'a str, } +pub struct EnumUnionDeserializer<'a> { + input: &'a Value, +} + pub struct EnumDeserializer<'de> { input: &'de [(String, Value)], } @@ -78,6 +82,12 @@ impl<'a> EnumUnitDeserializer<'a> { } } +impl<'a> EnumUnionDeserializer<'a> { + pub fn new(input: &'a Value) -> Self { + EnumUnionDeserializer { input } + } +} + impl<'de> EnumDeserializer<'de> { pub fn new(input: &'de [(String, Value)]) -> Self { EnumDeserializer { input } @@ -134,6 +144,71 @@ impl<'de> de::VariantAccess<'de> for EnumUnitDeserializer<'de> { } } +impl<'de> de::EnumAccess<'de> for EnumUnionDeserializer<'de> { + type Error = Error; + type Variant = Self; + + fn variant_seed(self, seed: V) -> Result<(V::Value, Self::Variant), Self::Error> + where + V: DeserializeSeed<'de>, + { + match self.input { + Value::Boolean(_) => Ok(( + seed.deserialize(StringDeserializer { + input: "Bool".to_owned(), + })?, + self, + )), + Value::Int(_) => Ok(( + seed.deserialize(StringDeserializer { + input: "Int".to_owned(), + })?, + self, + )), + Value::Long(_) => Ok(( + seed.deserialize(StringDeserializer { + input: "Long".to_owned(), + })?, + self, + )), + _ => Err(de::Error::custom("BOOOOOM")), // TODO: handle correctly + } + } +} + +impl<'de> de::VariantAccess<'de> for EnumUnionDeserializer<'de> { + type Error = Error; + + fn unit_variant(self) -> Result<(), Error> { + Err(de::Error::custom("Unexpected unit variant")) + } + + fn newtype_variant_seed(self, seed: T) -> Result + where + T: DeserializeSeed<'de>, + { + seed.deserialize(&Deserializer::new(self.input)) + } + + fn tuple_variant(self, _len: usize, _visitor: V) -> Result + where + V: Visitor<'de>, + { + Err(de::Error::custom("Unexpected tuple variant")) + } + + fn struct_variant( + self, + _fields: &'static [&'static str], + _visitor: V, + ) -> Result + where + V: Visitor<'de>, + { + Err(de::Error::custom("Unexpected struct variant")) + } +} + impl<'de> de::EnumAccess<'de> for EnumDeserializer<'de> { type Error = Error; type Variant = Self; @@ -439,7 +514,9 @@ impl<'a, 'de> de::Deserializer<'de> for &'a Deserializer<'de> { Value::Record(ref fields) => visitor.visit_enum(EnumDeserializer::new(&fields)), // This has to be a unit Enum Value::Enum(_index, ref field) => visitor.visit_enum(EnumUnitDeserializer::new(&field)), - _ => Err(de::Error::custom("not an enum")), + + ref v => visitor.visit_enum(EnumUnionDeserializer::new(v)), + // ref v => Err(de::Error::custom(format!("not an enum: {:?}", v))), } } ```

But that implies to hardcode variant names in EnumUnionDeserializer::variant_seed which would be fined if rsgen-arvro were part of avro-rs but is problematic for now as those hardcoded names could be different (they depend on the deserialization target).

codehearts commented 3 years ago

That's my thought exactly, the painful fix is probably best but it's unfortunate that there's no good way to share the visitor code. If it's alright with you, I can get started on a PR to implement this and gate it behind a feature flag if you don't think it should be there by default. I've been trying to find a cleaner way to implement it, but I haven't been able to figure anything else out.

lerouxrgd commented 3 years ago

Sure I'd be glad to get some help so a PR is welcome. I don't think a feature flag is needed, it could be gated by a GeneratorBuilder flag like GeneratorBuilder::use_variant_access which is used in the Templater to customize templating.

I'll keep looking at serde-based solutions, maybe there is some serde magic that we could leverage here (to avoid to have some dedicated templating logic for this specific visitor).

lerouxrgd commented 3 years ago

Implemented in 0.9.6 for types supported by avro-rs's unions.