hamba / avro

A fast Go Avro codec
MIT License
387 stars 95 forks source link

gen: Allow cutom go types via annotations #462

Open 0xjac opened 3 weeks ago

0xjac commented 3 weeks ago

Somewhat similar to #429 but more flexible and with keeping type annotations within the Avro schema, it would be great to generate a Go struct with custom types using annotations.

Specifically: add the ability to support specific annotations (go-type, go-key-type) similar to Java's but for go. The Go types would be expected to implement encoding.TextMarshaler/encoding.TextUnmarshaler (taking advantage of #68 and #327).

Example

Schema

record MyRecord {
  @go-type("math/big.Float") string value;
  @go-key-type("go.custom.com/ident.ID4") map<@go-type("math/big.Float") string> balances;
  array<@go-type("math/big.Float") string> values;
  @go-type("github.com/google/btree.BTreeG[int]") array<string> totals;
}
Which results in the following schema ```avsc { "type" : "record", "name" : "MyRecord", "fields" : [ { "name" : "value", "type" : { "type" : "string", "go-type" : "math/big.Float" } }, { "name" : "balances", "type" : { "type" : "map", "values" : { "type" : "string", "go-type" : "math/big.Float" }, "go-key-type" : "go.custom.com/ident.ID4" } }, { "name" : "values", "type" : { "type" : "array", "items" : { "type" : "string", "go-type" : "math/big.Float" } } }, { "name" : "totals", "type" : { "type" : "array", "items" : "string", "go-type" : "github.com/google/btree.BTreeG[int]" } } ] } ```

Gen cmd

avrogen -p main MyRecord.avsc

Actual Output

package main

// Code generated by avro/gen. DO NOT EDIT.

// MyRecord is a generated struct.
type MyRecord struct {
        Value    string            `avro:"value"`
        Balances map[string]string `avro:"balances"`
        Values   []string          `avro:"values"`
        Totals   []string          `avro:"totals"`
}

Desired Output

package main

import (
    "math/big"

    "github.com/google/btree"
    "go.custom.com/ident"
)

// Code generated by avro/gen. DO NOT EDIT.

// MyRecord is a generated struct.
type MyRecord struct {
    Value    big.Float               `avro:"value" json:"value"`
    Balances map[ident.ID4]big.Float `avro:"balances" json:"balances"`
    Values   []big.Float             `avro:"values" json:"values"`
    Totals   btree.BTreeG[int]       `avro:"totals" json:"totals"`
}

Notes

Potential pain points are:

  1. Extract the actual type and package to import from the fully qualified type in the go-type annotation.
    For complex cases, extra annotations should be considered... SQLC which also allows go type override handles this decently. Taking inspiration from their doc, this would define:
    1. go-type-import: Import path of the package.
    2. go-type-pkg: Package name if it doesn't match the import path.
    3. go-type-name: The actual Go type name
    4. go-type-ptr: Whether to use a pointer or the type directly.

      This could also be done with a ["null", T] union. But there might be some cases where specifying a union in Avro is not desirable, yet a pointer might be useful in Go.)

    5. Accordingly for map key type annotation, the corresponding go-key-type-import, go-key-type-pkg, go-key-type-name, go-key-type-ptr annotations.
    6. Generating clean import statements (without duplicates, in order and formatted correctly). It could be alleviated using goimports.
    7. The last annotation in the example (@go-type("github.com/google/btree.BTreeG[int]") array<string> totals;) is a bit more tricky as it is overwriting the array type. Marshaling to and from that type is not supported as it is not a string type supporting encoding.TextMarshaler/encoding.TextUnmarshaler. Overriding array (and map!) types be ignored (potentially with a warning/error) until marshaling can be handled for array, map or even arbitrary types.
nrwiersma commented 1 week ago

This is an interesting concept. I wonder if any other Go lib has implemented this, to compare the proposed annotations against. I also wonder if the number of annotations could be reduced to just go-type and go-type-pkg but putting the import, name and pointer into go-type, eg. @go-type("github.com/hamba/avro/*Schema")?

0xjac commented 1 week ago

I'm not aware of other libs doing type annotation for Avro. However I did not come up with it. I just took from the example for Java in the Avro specs, and adapted it for Go.

Regarding reducing the number of annotations, you can in most cases but it has some caveats which requires those extra ones for edge cases.

According to the Go spec, an import path can be any character. Thus it is a bit complicated to separate the import path from the rest without coming up with a mini markup language. I find it easier to have different annotations.

However a compiler "may also exclude the characters !"#$%&'()*,:;<=>?[\]^`{|} and the Unicode replacement character U+FFFD". I'm not sure if that's what go is actually doing, but in practice I have never seen a third part package which was not a URL and for most cases, we can have a simple logic to split the import path, package, type and pointer from a @go-type annotation, similar to SQLC.

This would look like: an optional * to indicate a pointer, the full import path which must end with the package name, a ., and the type, which looks like:

This should work in most cases. However if for any reason, the package name is not the suffix of the import path, if we need to use an import alias (for example if we use data types from two "avro" libs) or any other weird edge case which may come up; we need to be able to specify everything (import path, pkg name, type, alias, ptr) explicitly.

nrwiersma commented 1 week ago

The schema proposed for @go-type seems quiet good. Personally I also ways prefer starting in a simple place and dealing with edge cases as they arise and are concrete. I think it is clear that a second annotation like @go-type-pkg will be needed, and it is not uncommon for the package and import path to vary.