julianpeeters / avro-scala-macro-annotations

Compile-time tools for working with Avros in Scala
Apache License 2.0
55 stars 16 forks source link

Support for nesting in a more hierarchical manner #31

Open cory-p-oncota opened 8 years ago

cory-p-oncota commented 8 years ago

This is possibly out of the scope of this library but it would be nice to handle nested annotations as such:

{
  "type": "record",
  "name": "TestMessage",
  "namespace": "com.julianpeeters.example",
  "fields": [
    {"name": "message", "type": "string"},
    {
      "name": "metaData",
      "type": "com.julianpeeters.example.Metadata"
    }
  ]
}

And the nested message:

{
  "type": "record",
  "name": "MetaData",
  "namespace": "com.julianpeeters.example",
  "fields": [
    {"name": "source", "type": "string"},
    {"name": "timestamp", "type": "string"}
  ]
}

This would definitely improve the usability of the library in situations where large data models are represented. Would be glad to take up this work with some guidance on where to start looking. Perhaps a plug-in approach would be best (lots of features I'd like to use in conjunction with Avro as Scala macros take off; validation, type providers, etc)?

julianpeeters commented 8 years ago

Hi Cory, great idea.

Currently AvroTypeProviderMacro uses FileParser.getSchemas(infile), which uses the org.apache.avro.Schema.Parser to parse each file.

I think the problem is that I used a new org.apache.avro.Schema.Parser for each file, instead of using (and re-using) a static one like this SO question recommends.

My time frame to explore this is likely a good amount time from now, but If you'd like to take a shot at it, I've reformatted your example schemas to fit with current tests (thanks for those!), added a test that is currently failing, and pushed it up to a new branch.

If for some reason the static Parser doesn't do the trick, I might look into the maven-avro-plugin to see how they accomplish this (sounds like it can), or start thinking about a simple way to store the schemas or generated classes.

I'm not sure I understand what you mean by "plugin", so maybe this is irrelevant but, "plugin" as in build-tool plugin like an sbt-plugin? I see macros as a build-tool agnostic replacement for plugins (eh, both have other pros and cons), and I've got some sbt-plugin projects that I wish were easier to keep consolidated with their main project. I haven't looked into the use-case at all, but my inclination would be to first look into how we could add a @AvroValidation macro? For example, @AvroTypeProviders is already a macro (I feel like you knew that though, so sorry if I'm off the mark here).

Thanks for the interest and throwing around these good ideas!

cory-p-oncota commented 8 years ago

I will start by giving a shot at using a single parser. It sounds like this could very well do the trick for this and may in fact be intended behavior for an Avro implementation.

In regards to the plugin idea, maybe it's best to move the conversation to another issue as a feature enhancement or possibly even into Avro's codebase itself.

My current primary use case is based around the fact that Avro currently only supports structural validation (to my knowledge). In my application, we are looking to replace Joi with a language agnostic solution and implementing Avro for serialization and schemata is just the first step. Like Joi or JSON-Schema, I'd love to implement business rules (constraints, dynamic requires, etc) into my models so they can be abstracted from the application layer in consuming microservices.

I see this coming about as some form of mix-in with Avro but I will take a better look at the parser to see if custom fields can actually be supplied and go from there.

Cheers on the contributions and use of macros. Very much looking forward to the ways libraries like this can change my workflow!

julianpeeters commented 8 years ago

I'd love to implement business rules (constraints, dynamic requires, etc) into my models so they can be abstracted from the application layer in consuming microservices

Rad.

Very much looking forward to the ways libraries like this can change my workflow

Hehe, same.

cory-p-oncota commented 8 years ago

Reaching out for thoughts since I hit a bit of a road block in implementing this. Essentially, simply implementing the parser as a property of the FileParser object worked and will successfully load the schemas, however, the change broke the existing tests on the AvroTypeProviderTestNestedSchemaFile schema because the schema has already been loaded when we are trying to generate the MetaData case class.

We reach a "chicken-or-egg" situation because the obvious solution is to check the parser's definition for the fully qualified name via schema.getTypes() prior to add the schema to the parser, however, in order to do so, we must parse it with a temporary parser in order to get the name, effectively hitting the same error as before when working the separate files (the parent record fails on using the temporary parser since order matters).

Any ideas aside from implementing a manual JSON parser to cache/check the name prior to even running through the parser?

cory-p-oncota commented 8 years ago

I've partially solved the above using a couple of methods together:

1) FileParser now extends Parser directly in singleton fashion

2) AvroTypeProviderMacro now caches file names which it has already processed and looks them up from FileParser when already processed in order to get the correct schema definition

3) SchemaStore is eliminated and schemas generated via AvroRecord are added to the FileParser. This is possible because Parser already implements all features of the SchemaStore.

This commit has the changes thus far, however, I believe we're hitting a point where an opinionated approach is required. The default value tests 15-18 are failing due to the fact that they use AvroTypeProviderTestDefaultValue00 which has already been defined and thus cannot be redefined. Essentially, when using a single parser approach the opinion must be held that one cannot implement a message definition in more than one avsc file. Thoughts?

julianpeeters commented 8 years ago

Excellent work. I hadn't realized that my misuse of the parser had led to illegal schema definitions. Unfortunately it also reveals that this feature request should in fact be to Avro, for the same reason. IMO it's a worthwhile feature to add to Avro, and if you feel you're energies could be better spent there, here I'd be grateful to accept a pull request of what you've got so far and fix the tests to conform to Avro's opinion when I get the time.