Advice for reusing member traits

kubukoz commented 4 months ago

Hi!

In my current organization, we use Smithy to define Databricks tables. These tables are populated with data coming from events, which are also modeled in Smithy.

Many times, you want to reuse a bunch of fields from an event definition in the table definition. For cases where you want to use all or almost all fields, we've been using mixins: the mixin contains the fields, and the event/table both inherit the mixin.

The problem begins when the tables no longer are intended to match the event's structure exactly: we call these curated tables. In this world, you want to pick and choose which fields you're going to keep, possibly flatten the structure into something with fewer levels of nesting, apply some renaming (with a trait or otherwise), and so on. At the same time, the intention is to reuse the traits that were originally applied on the member being referenced.

Semantically, this sounds a lot like having the ability to reference members as member targets:

structure MyEvent {
  @documentation("s1")
  s1: String
  @documentation("s2")
  s2: String
}

structure MyTable {
  s_one: MyEvent$s1
}

but that's currently not allowed by Smithy, presumably for valid reasons.

What would you recommend we do in such cases? Is there something Smithy could/should be doing to help such use-cases?

Workarounds

We have a couple workarounds, neither are really convenient, hence my question for advice.

Define a mixin for each field

@mixin structure S1Field {
  @documentation("s1")
  s1: String
}

structure MyEvent with [S1Field] { ... }
structure MyTable with [S1Field] {
  @tableFieldName("s_one") $s1
}

This solves the problem of reusing all the traits of the field, but introduces obvious boilerplate of having to define a mixin for every member.

In addition, it doesn't allow renaming, unless an extra trait is defined.

Copy-paste the traits

structure MyEvent {
  @documentation("s1")
  s1: String
  @documentation("s2")
  s2: String
}

structure MyTable {
  @documentation("s1")
  s_one: String
}

This solves both problems, but allows the members' trait applications to diverge. That can further be mitigated by linking the definitions with another trait, which will validate that the table member's traits are a superset of the original member's traits.

structure MyEvent {
  @documentation("s1")
  s1: String
  @documentation("s2")
  s2: String
}

structure MyTable {
  @documentation("s1")
  @traitsMatch(MyEvent$s1)
  s_one: String
}

That seems like a worrying amount of complexity.

Extract the traits to a simple shape

@documentation("s1")
string S1Field

structure MyEvent { s1: S1Field }
structure MyTable {
  s_one: S1Field
}

Probably the cleanest solution smithy-wise. However, some code generation tools (like smithy4s) disambiguate between custom string shapes and string members, and will generate newtypes instead of plain strings. This means that such a refactor will require code changes on the application's side to migrate off "raw" strings.

mtdowling commented 4 months ago

I think the Smithy way, as you point out, is to make dedicated shapes that store the reusable traits on the shape rather than the member. This is the route I would take. Nothing comes to mind on anything else we could do here for these kinds of transformations.

Maybe you can use dedicated shapes and add a trait to smithy4s to tell it to not give some shapes a dedicated named type?

kubukoz commented 4 months ago

Maybe you can use dedicated shapes and add a trait to smithy4s to tell it to not give some shapes a dedicated named type?

I was thinking about it too, now that you said it out loud I think that may be a decent workaround :)

thanks, I'll see what other smithy4s maintainers think.

kubukoz commented 4 months ago

For future generations, FYI smithy4s already has a smithy4s.meta#unwrap trait which does this. A type is still generated for the shape, but its underlying value is used anywhere else.

smithy-lang / smithy