lerouxrgd / rsgen-avro

Command line and library for generating Rust types from Avro schemas
MIT License
37 stars 29 forks source link

support of metadata #32

Open untereiner opened 2 years ago

untereiner commented 2 years ago

Hi,

I have avro schemas where metadata have been introduced for records. As written in the doc:

A JSON object, of the form:

{"type": "typeName" ...attributes...}

where typeName is either a primitive or derived type name, as defined below. Attributes not defined in this document are permitted as metadata, but must not affect the format of serialized data.

But has someone a idea how I could generate these metadata as part of an avro trait without serializing them ? Maybe: #[serde(skip)] to skip both serialization and deserialization ?

Whould you accept a PR with such a feature ?

lerouxrgd commented 2 years ago

Could you provide a more specific example of such a feature ? With an example schema and the expected generated struct.

untereiner commented 2 years ago

Here is an example:

{
    "type": "record",
    "namespace": "Core",
    "name": "Ping",
    "protocol": "0",
    "messageType": "8",
    "senderRole": "client,server",
    "protocolRoles": "client, server",
    "multipartFlag": false,

    "fields":
    [
        { "name": "currentDateTime", "type": "long" }
    ]
}

{
    "type": "record",
    "namespace": "Core",
    "name": "Pong",
    "protocol": "0",
    "messageType": "9",
    "senderRole": "client,server",
    "protocolRoles": "client, server",
    "multipartFlag": false,

    "fields":
    [
        { "name": "currentDateTime", "type": "long" }
    ]
}

messagetType, senderRole, protocoleRoles, multipartFlag are metadata I think. They are in the schema. But they are constants. There value cannot change unlike the fields.

I do not exactly know how to represent them in Rust. Maybe something like:

mod Core

struct Ping { 
   currentDateTime: i64
}

impl Ping {
   const messageType: &str = "8";
   const senderRole: &str = "client,server";
   const protocoleRoles: &str = "client,server";
   const multipartFlag: bool = false;
}

struct Pong { 
   currentDateTime: i64
}

impl Pong {
   const messageType: &str = "9";
   const senderRole: &str = "client,server";
   const protocoleRoles: &str = "client,server";
   const multipartFlag: bool = false;
}

And I think it could reopen #23 because I am not sure this case is handled by the schema generation.

lerouxrgd commented 2 years ago

Sadly those are non standard fields, there is no way to know their (potentially nested) type. Moreover there is no "catch all" variant for such metadata in the underlying apache-avro Schema enum, therefore I don't think that there is a way to handle such a use-case.

untereiner commented 2 years ago

I understand your point. These attributes are not part of the avro spec. However their presence in the schema is allowed by the spec. For their types I think it could be reasonable to limit the list to the same as those of the avro spec.

I have a data exchange protocol using avro schemas that uses this possibility to add constants (no need of fields) at the protocol level.

martin-g commented 2 years ago

As mentioned by @lerouxrgd custom attributes are not supported yet by apache-avro.

We've just had a big head ache due to the new impl for those in the C++ SDK:

At the end we agreed to make the custom attributes' values string-only. The user application could parse the value if needed. Please create a new JIRA ticket at https://issues.apache.org/jira/browse/AVRO for adding support for custom attributes in the Rust SDK. A PR with the actual implementation would be awesome too! :-)

untereiner commented 2 years ago

@martin-g I looked very quickly at those issues. It is mentioned for « at field level ». Is this a still a general implementation for custom attributes at any level ?

I will open a ticket and try an implementation next week.

martin-g commented 2 years ago

According to the spec attributes/metadata could be next to "type", so I understand it both top-level and field-level.

But top-level looks very much like file metadata. File metadata is supported in Rust SDK 0.14.0+!

untereiner commented 2 years ago

First a question: the spec calls these: "metadata", so why calling them "custom attributes" instead of metadata in the implementation ?

I do not know what "Object Container Files" are and used for. They are things:

For me the metadata are in the schema only because of this from spec:

but must not affect the format of serialized data

martin-g commented 2 years ago

Better ask these questions in the dev@ mailing list.

First a question: the spec calls these: "metadata", so why calling them "custom attributes" instead of metadata in the implementation ?

Not sure, but for me the answer is - consistency with the other SDKs.

I have started working on this and I will create a draft PR soon!