Fix: Fix incorrect casing when rendering fields and enums

marcosschroh / dataclasses-avroschema

Generate avro schemas from python classes. Code generation from avro schemas. Serialize/Deserialize python instances with avro schemas

https://marcosschroh.github.io/dataclasses-avroschema/

MIT License

213 stars 64 forks source link

Fix: Fix incorrect casing when rendering fields and enums #694

Closed sanjayjayaramu-cruise closed 1 month ago

sanjayjayaramu-cruise commented 1 month ago

Fixes https://github.com/marcosschroh/dataclasses-avroschema/issues/693 https://github.com/marcosschroh/dataclasses-avroschema/issues/692 This change fixed incorrect case rendering of fields inside a record which has the same record type and uses a reference of the type to determine the type of the field. Since the field name is converted to Pascal case for Record types, the reference type name should also be Pascal cased. The Enum class rendering has a similar issue where the default value of the enum field does not contain the Pascal case of the Enum class.

marcosschroh commented 1 month ago

@sanjayjayaramu-cruise Could you fix the tests?

sanjayjayaramu-cruise commented 1 month ago

@sanjayjayaramu-cruise Could you fix the tests?

Still trying to get tests work locally, will fix them asap

sanjayjayaramu-cruise commented 1 month ago

@sanjayjayaramu-cruise Could you fix the tests?

Hey @marcosschroh , the problem with my approach here is that the round trip tests now fail tp generate the original schema from model classes since the change converts basically all dataclasses and enums to use pascal case. May be this is not desired and works for everyone?

sanjayjayaramu-cruise commented 1 month ago

@sanjayjayaramu-cruise Could you fix the tests?

Hey @marcosschroh , the problem with my approach here is that the round trip tests now fail tp generate the original schema from model classes since the change converts basically all dataclasses and enums to use pascal case. May be this is not desired and works for everyone?

Also note that, the round trip test(s) will fail for any nested record type on master and there is no unit test covering that. I think there is no 100% guarantee that the user can get back the schema using the model generator with the same casing as the original schema.
Here is an example schema and the resulting schema generated from the autogen class

    original_schema = {
        'type': 'record', 
        'name': 'prism', 
        'fields': [
            {
             'name': 'pose', 
             'type': {
                 'type': 'record', 
                 'name': 'pose', 
                 'fields': [{
                     'name': 'pose_name', 
                     'type': 'string'
                     }]
                }
            }, 
        ]
    }

After

{
    "type": "record",
    "name": "Prism",
    "fields": [
        {
            "name": "pose",
            "type": {
                "type": "record",
                "name": "Pose",
                "fields": [
                    {
                        "name": "pose_name",
                        "type": "string"
                    }
                ]
            }
        }
    ]
}

Would you accept removing the round trip tests?

marcosschroh commented 1 month ago

Hey @sanjayjayaramu-cruise

The problem that you are facing is overshadowed. It will never work because in python you can not have this:

@dataclasses.dataclass
class MslPrism(AvroModel):
    """
    The 3D bounding box labels of this object.
    """
    data_source: data_source = dataclasses.field(metadata={'doc': 'Source of the ingested segment, ie. ON_ROAD, SIMULATION, etc.'}, default=data_source.UNKNOWN)

The problem is described in the docs and it is independent of this library, the same happens in pydantic or any library. The field name CAN NOT be the same as the type.

There are 2 ways to mitigate this:

Have a "proper schema", making sure that the field name is different from the type
Modify your PR and add an extra property the enum generated, called schema_name. This new property can be used to generate the enum name rather than class.__name__. Similar to what is describe in the docs for Records and Meta.schema_name

By the way, there are tests for record relationship: OneToOne, OneToMany using array, OneToMany using maps and Self Relatinship

sanjayjayaramu-cruise commented 1 month ago

Modify your PR and add an extra property the enum generated, called schema_name. This new property can be used to generate the enum name rather than class.name. Similar to what is describe in the docs for Records and Meta.schema_name

Could you expand a little on this? Do you mean, only when the field name == type name, add a new Meta class to retain the schema main when we deserialize it for the tests?

marcosschroh commented 1 month ago

Could you expand a little on this? Do you mean, only when the field name == type name, add a new Meta class to retain the schema main when we deserialize it for the tests?

Yes and when the enum name does not follow the class convention naming in python. We should do the same for records.

If an enum name of record name defined in a avro schema does not follow the class naming convention in python, then we should add the property schema_name to class Meta which must contain the original name. If we do this then we can go from schema <--> Model.
If an enum name is the same as the field name, then we do the same. Possible for records as well.

There might be cases where the field_name and inner name (record nameofenum name) follow thepython namingconvention and they are the both equal. For this use cases the end user must solve it, either modifying the schema are using the propertyschema_name`.

marcosschroh commented 1 month ago

Could you expand a little on this? Do you mean, only when the field name == type name, add a new Meta class to retain the schema main when we deserialize it for the tests?

Yes and when the enum name does not follow the class convention naming in python. We should do the same for records.

If an enum name or record name defined in a avro schema does not follow the class naming convention in python, then we should add the property schema_name to class Meta which must contain the original name. If we do this then we can go from schema <--> Model.
If an enum name is the same as the field name, then we do the same. Possible for records as well.

There might be cases where the field_name and inner name (record name or enum name) follow the python naming convention and they are the both equal. For this use cases the end user must solve it, either modifying the schema are using the property schema_name.

sanjayjayaramu-cruise commented 1 month ago

Hey @marcosschroh I have updated the PR with the suggestions. PTAL when you can. Thank you.

codecov[bot] commented 1 month ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.48%. Comparing base (a4cc3d7) to head (412a47a).

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #694 +/- ## ======================================= Coverage 99.48% 99.48% ======================================= Files 34 34 Lines 1939 1944 +5 ======================================= + Hits 1929 1934 +5 Misses 10 10 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

sanjayjayaramu-cruise commented 1 month ago

It looks good @sanjayjayaramu-cruise . Some small changes and we are good to go.

Thank you. Applied the recommended changes.