Schema isomorphism - Githubissues

jdegoes commented 2 years ago

Increasingly, it will be necessary for ZIO Schema to support schema-first approaches: for example, a new Scala project must interoperate with existing gRPC services, whose messages are defined using protobuf.

In such case, the Scala-first approach supported by ZIO Schema will not work very well, because it would require Scala developers to painstakingly create an ADT whose automatically-derived Schema will "happen" to correspond with the schema-first schema.

A much better approach is for code-generators (or ideally, macros) to programmatically create a ZIO Schema. This ZIO Schema would ideally act as a canonical definition of the protocol, but without necessarily defining or constraining what user-defined ADT a Scala developer could use to interact with that protocol.

Thanks to migrations, this approach is already possible: code-gen could generate a ZIO Schema, and then a user could "migrate" that Schema to another one, for a user-defined ADT. The drawback to this approach is that the migration cannot known to be valid or invalid at compile-time: one must actually try it at runtime to see if the migration is possible.

A way of pulling this validation to compile-time may be introducing a notion of schema isomorphism. Independently, this concept has arisen in ZIO SQL, which @sviezypan is working on. In particular, Jaro created a type class to witness that a given schema had all the right types in all the right orders to allow inserting case classes into a relational database table.

It may be the case that pulling this concept of compile-time "schema isomorphism" may enable schema-first usage of ZIO Schema, without sacrificing all the type safety and value-oriented features that ZIO Schema provides, while enabling richer integration in libraries like ZIO SQL, without them having to introduce their own more limited notions.

How might such a change be accomplished?

One idea is adding another type parameter to Schema, such that Schema[A] would become Schema[Abstract, Concrete]. The new type parameter would represent the structure of the data type, independent of ordering or specific materialization.

For example (assuming Scala 2.13 and higher, with singleton types):

type ->*[A, B]
type ->+[A, B]
// type &[A, B] = A with B

final case class Person(name: String, age: Int)

implicit val schema: Schema[("name" ->* String) & ("age" ->* Int), Person] = ...

With such a type parameter, which is not modified by Transform nodes, it becomes possible to do type-level comparisons and transformations on the structure of a schema. Moreover, it becomes possible in libraries like ZIO SQL to say, "I can work with any schema, so long as it has N fields of types X, Y, Z, ...".

This is an early draft and more work is required to make sure this direction is feasible, but other directions should also be considered so long as they satisfy these design goals.

jdegoes commented 2 years ago

I think we can view the algebraic definition of the type as a function of the Scala type, which means we can move the abstract type into the Schema trait, something like:

trait Schema[A] {
  type Abstract = ("name", String) & ("age", Int)
}

This is perhaps too specific, it could be generalized by using a concept similar to accessor builders:

trait Schema[A] {
  type MakeAbstract[MakeFieldType[NameType, ValueType]] = MakeFieldType["name", String] & MakeFieldType["age", Int]
}

This can be used to produce the tuple type, and indeed, probably should be for schema isomorphism, but it can be used for other purposes as well.

So we'd have something like:

trait Schema[A] {
  type MakeAbstract[MakeFieldType[NameType, ValueType]] = MakeFieldType["name", String] & MakeFieldType["age", Int]

  type Abstract = MakeAbstract[Tuple2]
}

Then, we can add a type alias in the companion object that can refine the type:

object Schema {
  type WithAbstract[A, MakeAbstract0[_, _]] = Schema[A] { type MakeAbstract[F[_, _]] = MakeAbstract0[F] }
}

Next, we ensure that constructors for Schema define MakeAbstract appropriately, and that all method-based constructors (including macro-based derivation), return the WithAbstract type, in which the abstract type member is known and specified.

If we successfully propagate this information everywhere, then we can now tell if two schemas are equivalent if they have the same abstract types. Possibly we can introduce a type class for this:

trait SchemaSubtype[S <: Schema[_], T <: Schema[_]]
object SchemaSubtype{
  implicit def schemaSubtype[L, R, A, B >: A]: SchemaSubtype[Schema.WithAbstract[L, A], SchemaSubtype[R, B] = ...
}

This may not work exactly, but the idea is we want to describe when a schema with abstract type A is a "subtype" of schema with abstract type B (modelling this based on <:< and =:= could be useful).

Then at compile time, we can know two things:

Some schema A is "bigger" than another schema B (i.e in theory, we can convert values of the bigger type to the smaller type)
Some schema A is equivalent to schema B, i.e. has the same information content, even if the types differ

This now allows projects like ZIO SQL to decode a value of type A into anything which is isomorphic to a tuple of the required types.

jdegoes commented 2 years ago

One thing to figure out is how fancy to get with this: should we special-case option? For any field Option[A], we don't really need a value of type A to compute it, since we can supply None. Similarly for List and other collections, which have a natural empty. It's possible we should really have AbstractMin and Abstract.

This would allow ZIO SQL to allow users to omit nullable fields; or allow converting from a smaller record to a bigger record, but only if the newly-added fields are of type Option; etc.

zio / zio-schema

Schema isomorphism #164