lerouxrgd / rsgen-avro

Command line and library for generating Rust types from Avro schemas
MIT License
37 stars 29 forks source link

Fix inconsistent enum default name generation #57

Closed oshevtsov closed 1 year ago

oshevtsov commented 1 year ago

When working on a project that performs code generation from Avro schema files, I have encountered an issue with enums. This seems to be an edge case that is easy to overlook, but the generated code does not compile. The problem is when an enum is defined as a struct field and has a default symbol that is only two characters long, starts with a small letter, and ends with a capital letter (e.g. mJ, see example below).

Minimal example:

{
    "type": "record",
    "name": "MyRecord",
    "namespace": "com.example.data",
    "doc": "Record doc",
    "fields": [
        {
            "name": "id",
            "type": "int",
            "doc": "Identifier"
        },
        {
            "name": "unit",
            "type": {
                "type": "enum",
                "name": "MeasurementUnit",
                "symbols": [
                    "mJ"
                ]
            },
            "doc": "Measurement unit",
            "default": "mJ"
        }
    ]
}

The issue manifests itself at line 1035 of src/templates.rs, where s.to_upper_camel_case() is called twice when constructing the default symbol name (in contrast, this method is called only once when generating the enum definition). For the example above, the sequence of modifications is as follows: mJ -> MJ -> Mj.

The generated code looks as follows:

#[derive(Debug, PartialEq, Eq, Hash, PartialOrd, Ord, Clone, serde::Deserialize, serde::Serialize)]
pub enum MeasurementUnit {
    #[serde(rename = "mJ")]
    MJ,
}

/// Record doc
#[derive(Debug, PartialEq, Eq, Clone, serde::Deserialize, serde::Serialize)]
pub struct MyRecord {
    pub id: i32,
    #[serde(default = "default_myrecord_unit")]
    pub unit: MeasurementUnit,
}

#[inline(always)]
fn default_myrecord_unit() -> MeasurementUnit { MeasurementUnit::Mj }

See the difference between MeasurementUnit::MJ and MeasurementUnit::Mj. The proposed change is minimal and makes sure that s.to_upper_camel_case() is called only once both when generating the enum definition and when generating the default initializer function inside the record.

lerouxrgd commented 1 year ago

Hello, thanks for reporting this. Maybe you can keep the sanitize call ?

oshevtsov commented 1 year ago

Sure I can :slightly_smiling_face:. However, the variable s has been already converted to upper camel case and sanitized, see https://github.com/lerouxrgd/rsgen-avro/blob/master/src/templates.rs#L1033. So, it should look exactly as the corresponding name inside the enum definition. Look at https://github.com/lerouxrgd/rsgen-avro/blob/master/src/templates.rs#L428.

IMHO it should be as it is right now. If you feel like it is necessary to sanitize it again (to be on a safe side), then I would do it at both places, but not only one. However, since it is already sanitized, there is no need to do it again.

lerouxrgd commented 1 year ago

Sure you are right, thanks for the fix ! I will make a release as soon as I can.

oshevtsov commented 1 year ago

Thank you for looking at this so quickly :rocket:

lerouxrgd commented 1 year ago

Released in 0.12.2