softwaremill / tapir

Rapid development of self-documenting APIs
https://tapir.softwaremill.com
Apache License 2.0
1.37k stars 422 forks source link

Module for JSON codec derivation #2923

Open adamw opened 1 year ago

adamw commented 1 year ago

Currently, to create a json body input/output, both a Schema and json-library-specific encoders/decoders are needed. This means, that generic derivation is typically done twice (once for json encoders/decoders, once for schemas). Moreover, any customisations as to the naming strategy etc. need to be duplicated, often using different APIs, both for the json library and for the schemas.

It would be great to do the configuration and derivation once - but to do that, we would need to provide a module which would provide joint json encoder/decoder + tapir schema derivation. In other words, we would need to write code which derives a JsonCodec[T] (this includes the encode, decode and schema).

Doing this for all json libraries would be highly impractical, and a ton of work, for which we don't have resources. That's why I'd like to approach this using the json library that will be included in the Scala toolkit - that is, uPickle. uPickle can use a better derivation mechanism anyway (as our blogs have described), so it might be an additional win for our users.

Such a derivation would have to be written using a macro - and as we know, these are different in Scala 2/3. I think we should target Scala 3.

So summing up, the goal of the new module is to:

While it might seem that the derivation could be implemented using Magnolia, I think writing a dedicated macro, which could utilize Scala 3's Mirrors, would actually be better. First, we would directly generate the code, instead of generating an intermediate representation, which is only converted to the final codec at run-time. That's a small performance win. But furthermore, we can provide better, contextual, error reporting. And excellent errors is something I'd like to be a priority for this task. I've done some experiments with deriving Schema using a macro directly here, but the work there has unfortunately stalled.

As for configuring the derivation, we should take into account the following:

In the end, the user should get an alternative to the current import sttp.tapir.json.upickle.* + optional imports for auto-deriving uPickles Reader/Writer & tapir's Schema; the alternative would define jsonBody etc. as the integration today, plus the macro to derive the JsonCodec.

Summing up, the top-level requirements for the macro are:

kciesielski commented 1 year ago

Here are some notes after my initial analysis:

General remarks

Some of our requirements can be addressed with the @upickle.implicits.key annotation. I don't know if we can add annotations using macros, here's a thread where I'm asking for advice to figure this out. In cases where that's the only viable possibility, I've put a 🔑 icon to emphasize this.

Features

adamw commented 1 year ago

First, a side note - if you're not lucky on the scala-users forum, you can also try dotty discussions in the metaprogramming section: https://github.com/lampepfl/dotty/discussions/categories/metaprogramming

Second sinde note: I think a good "terminology fix" might be to call enumerations only "true" enumerations, that is Scala 3 enums, where all cases are parameterless. If the cases have parameters, that's only a sugar for a sealed trait.

What is kind of worrying is the some cases can only be handled with 🔑 . So either we find a way to add annotations to a type using macros, or ... ? I guess there's no alternative really.

Well, except rewriting the pickler derivation. After reading the upickle code, is that even feasible?

kciesielski commented 1 year ago

I see, thanks for explaining with enumerations, let's use the terminology as you suggested. The discussion board you posted looks promising. I was able to find a fresh thread on refining types, which may be helpful to deal with annotations. Working on this now.

adamw commented 1 year ago

I was thinking about a possible implementation strategy, and here's what I came up with.

The first constraint is that we should honor existing ReadWriter instances when they exists - either for the built-in types, or some esoteric ones.

The second constraint is that derivation should follow standard Scala practices, that is be recursive - so that the derived typeclass for a product/coproduct is created using implicitly available typeclass instances for children. This rules out Codec as the typeclass, as it's not recursive - only the top-level instance for a type is available.

Picklers

Still, we need to derive both the ReadWriter instance and the Schema instance. So maybe we should do just that: derive that pair, with an option to convert to a Codec. E.g.:

case class Pickler[T](rw: ReadWriter[T], schema: Schema[T]):
  def toCodec: JsonCodec[T]

implicit def picklerToCodec[T](implicit p: Pickler[T]): JsonCodec[T] = p.toCodec

The Pickler name is quite random, but it's the best I came up with so far ;)

Configuration

Another design decision is what means of configuration to provide for the derived schemas/picklers. We already have two ways of customising schemas: using annotations and by modifying the implicit values. Originally I suggested adding a third one (explicitly providing an override for annotations), but maybe that's not necessary and we can use what's already available.

That is, the implicitly available Schema for a type could be used to guide the derivation of the ReadWriter - if it's missing. The schema already has all that we need: user-defined field names and default values. Btw., here #2943 would be most useful to be able to externally provide alternate field names.

This also means that the Pickler derivation would have to assume, that the schema's structure follows the type's structure (when it's a product/coproduct), and report an error otherwise.

Derivation

Now the main complication is implementing Pickler.derived[T]. I think it should follow more or less these rules:

Enums, inheritance

To support special cases, such as various enumerations or inheritance strategies, we can use a similar approach as currently, that is provide methods on Pickler to create the instances: Pickler.derviedEnumeration (similar as the method on Schema and Codec), Pickler.oneOfUsingField, Pickler.oneOfWrapped (similar as on Schema).

That way we would use the "standard" Scala way of configuring generic derivation - specifying the non-standard instances by hand - instead of inventing our own one.

Runtime/compiletime

Using the schema to create the ReadWriter instance means that it would be created at run-time - as only then, we have access to the specific Schema instance (which might be user-provided and computed arbitrarily). So at compile-time, we would only generate code which would do the necessary lookups / create the computation.

Of course, there might a hole in the scheme above and it might soon turn out that it's unimplementable ;) WDYT @kciesielski ?

kciesielski commented 1 year ago

Leaving some notes after our recent discussion with @adamw:

  1. The main API entrypoint is Pickler, and we want to allow deriving picklers without users providing schemas.
  2. If we allowed creating Pickler[T] with user-provided Schema[T], we would break the mechanism of Pickler creating its own schema out of child schemas from summoned child picklers. That's why we emit a compilation error when a Schema is in scope, but no Reader/Writer. Either both Schema/ReadWriter is provided or the Pickler takes care of deriving them.
  3. Therefore, to allow schema customization outside of case class annotations, we need some API in the Pickler, something like:
    Pickler.derivedCustomise[Person](
    _.age -> List(@EncodedName("x")),
    _.name -> List(@EncodedName("y"), @Default("adam")),
    _.address.street -> ...
    )
  4. This customization DSL is then processed in the pickler in order to enrich derived schemas, and before creating Readers/Writers, which use schemas for encoded names and default values.
adamw commented 1 year ago

Yes, looks correct :) In the future we might also want to add Schema.derivedCustomise for consistency, and maybe depracte the .modify variant of schema customisation then?

adamw commented 11 months ago

Reopening for possible jsoniter work