target / theta-idl

Define communication protocols between applications using algebraic data types.
Other
45 stars 9 forks source link

Remove Avro rendering restriction on top-level schemas #42

Closed jack-davies closed 2 years ago

jack-davies commented 2 years ago

Inspired by discussion in https://github.com/target/theta-idl/pull/39, a minimal attempt to remove the restrictions on top-level schema rendering.

This seems to behave well in my own testing (outside of nix), would be great if you could give it a look over to see if this all seems sensible to you @TikhonJelvis.

Not sure which branch you'd prefer me to base off here but I am working from stage.

TikhonJelvis commented 2 years ago

Yeah, stage is the branch to work off. Was just talking about this in another thread—sounds like the names stage and main are a bit confusing for what the branches are supposed to be, so I'm thinking of renaming them to main and release respectively.

Looks like a solid change overall. The build is failing because I use -Werror in CI and there's an unused import.

Seems like theta avro all will generate Avro schema files for things like newtypes/aliases over primitive types[^primitives] after this change. As a user, is this what you would expect for the command? It's always been a bit of a wonky convenience function, so I'm not sure what the “reasonable” semantics for it would be.

I can see a few reasonable ways to avoid this:

  1. Add an explicit filter for primitive aliases/newtypes in theta avro all
  2. Restrict theta avro all to the modules explicitly passed in to the command rather than using Theta.transitiveImports
  3. All of the above

What seems like the best way to handle this?

[^primitives]: I believe this doesn't happen right now, but I've also thought about having a base module with definitions for each primitive type that gets imported implicitly. If this happened, theta avro all would generate files for each primitive type by itself. I can see making this change in the future if it makes things more regular or consistent throughout the codebase, so it's worth considering.

jack-davies commented 2 years ago

Apologies for the delay on this, it should pass now with -Werror.

Seems like theta avro all will generate Avro schema files for things like newtypes/aliases over primitive types1 after this change. As a user, is this what you would expect for the command? It's always been a bit of a wonky convenience function, so I'm not sure what the “reasonable” semantics for it would be.

So one of the use cases we have is to generate Avro as an intermediate, which we then pass to avro-ts to generate TypeScript interfaces for our frontend. While it seems a little odd at first glance to have e.g. in the intermediate stage Foo.avro containing just "string", having a faithful intermediate representation of every type allows us to generate everything we need comprehensively-- in this case the rendered TypeScript type Foo = String.

Your call whether you want to proceed with this, but in my opinion as this is conformant with the Avro spec it makes sense to be comprehensive here.

TikhonJelvis commented 2 years ago

Nice, looks good. I can merge this PR as-is once the tests pass. Happy to change the behavior of theta avro all down the line if it ends up inconvenient.

So one of the use cases we have is to generate Avro as an intermediate, which we then pass to avro-ts to generate TypeScript interfaces for our frontend.

Good to know. When Theta generates an Avro schema, it inlines definitions for each type referenced in the top-level type—how does avro-ts deal with that? As far as I know Avro schemas themselves don't have a mechanism for importing or splitting definitions over multiple files.

dmvianna commented 2 years ago

We create a single type definition with all the types we need and convert that. Else we would end up with multiple redundant, conflicting copies of the same definitions. I don't think avro-ts does anything more than theta does.

TikhonJelvis commented 2 years ago

Cool, makes sense. Sounds like you've developed a pipeline that does what you need. Always happy for any suggestions on how to make your workflow easier :).

dmvianna commented 2 years ago

My pie in the sky would be getting a deduplicated TypeScript module with all the types provided by a single call to theta. Like

theta typescript all -m animal.dog -m animal.cat -o animals

What we currently do is instead

theta avro type -t animal.Chimera -o animals_temp
<some jq sorcery>
find animals_temp -iname '*.avsc' | xargs npx avro-ts -O animals
TikhonJelvis commented 2 years ago

Makes sense.

I see at least two things we could do:

  1. Output deduplicated Avro definitions for multiple types in a format that other commands can consume
  2. Generate Typescript types directly from Theta

These don't have to depend on each other, so it could make sense to do one or both.

1

Deduplicated Avro definitions would be useful for passing to other commands in general—not just Typescript. Seems more useful than what theta avro all does today. We could return a JSON list with an entry for each type defined in a set of modules. This is valid as an .avsc file and I've seen people recommend this as a way to define multiple schemas in a single file—if I'm understanding the Avro specification correctly this would technically define a union of all the types.

theta avro all could output this list over STDOUT instead of creating directories and files, which I would prefer over the current logic. Interpreting the output as a union schema might be a bit wonky, but it would be a handy format for streaming into jq or processing in other languages.

I expect this change would require some refactoring in Theta.Target.Avro.Types, but unless I'm missing something, it would not be a massive change.

2

Separately, we could support generating Typescript directly from Theta. I wouldn't want to depend on tsc and avro-ts in Theta itself, so this would be writing code generation in Haskell.

Code generation would let Typescript be a first-class citizen in Theta: generating types would take a single command and we'd have more control over what the generated types look like. One of the main reasons I wrote Theta in the first place was that Haskell types generated from Avro schemas were awkward to work with; since Theta is structured around algebraic data types, we could generate normal Haskell types with nice constructors rather than dealing with some kind of awkward encoding of unions.

I haven't touched Typescript—or even JavaScript—in six or seven years, so I'm not sure what idiomatic types would look like. I gather that Typescript uses union types (ie not tagged unions) widely, so the idiomatic design would look pretty different from Haskell or Rust.

I've added support for several languages to Theta and my experience has been that writing code for generating types usually just takes a day or two—that's all I did for Kotlin, and it took me longer to set up a Kotlin environment + tests than it did to write the code itself :). Hooking the types up to read/write Avro was harder, but with Typescript we would be leaving that up to avsc instead.

dmvianna commented 2 years ago

We had a similar journey, and that's why we're happy with Theta 😉 . Yes, frankly I much prefer STDOUT than -o. I'm not the TypeScript person in the team, so I can't help with being idiomatic.

Both 1. and 2. seem very sensible to me.