Rethink Typeclass Derivation

odersky commented 5 years ago

I have opened this issue to collect ideas and requirements how we want to evolve the typeclass derivation framework in https://dotty.epfl.ch/docs/reference/contextual/derivation.html. The goal is to come up with something lightweight that can be used as a basis for @milessabin's (and possibly other's) designs for high-level typeclass derivation, and that can also stand on its own as a low-level derivation API.

At the SIP retreat, Miles presented his current design. It included a set of low-level "erased" abstractions that are implemented by compiler-generated code. These abstractions are quite similar in scope to what's supported by Generic and Mirror in the current implementation. But there are also differences. (I am writing this down as I recounted it from the SIP retreat, please correct where I am inaccurate).

Differences

The erased API is typeless, having Any for all inputs and outputs. By contrast, the current API does expose types to some degree even though it uses casts internally. This difference has probably to do with the fact that the erased API was intended for internal use only.
The erased API does not cover labels. Labels are treated only at the type level.
Erased API implementations are meant to be generated automatically for all case classes, case objects, and sealed classes and traits (and we'd have to extend that to all enum values as well). By contrast the current implementation generates a Generic instance only if there is a derives clause. This is not a hard restriction since Generic instances can be generated also after the fact if they are summoned as an implied instance. But that second way of doing it can lead to code duplication.
The erased API distinguishes between sums and products. In an ADT each case gets its own variant of Generic and the Generic for the overall sum type exposes a way to navigate to the Generic of the correct case. By contrast the currently implemented API exposes a single Generic for the whole ADT.

Discussion

Here are some comments on each of these four differences.

it's probably uncontentious that a published API should expose fine-grained types where possible, so that user's programs don't have to do asInstanceOf everywhere.
Labels should be treated only at the type level. At run-time, we can rely already on getClass.getName and productElementNames, so no new functionality is needed.
It would be great if we could generate derivation infrastructure unconditionally for all sealed sum types (including enums), all case classes and objects and all enum values. The constraint to make this feasible is size of the generated code. The additional code we generate for a case class should be modest. In particular, it would be good if no additional class was generated. A case class already generates two JVM classes, one for the class itself and the other for its companion object. It would be problematic to unconditionally generate more classes that serve type class derivation. Also, the additional overhead to support an enum value should be close to zero. The current implementation does not fulfil this requirement since each generated Generic is its own class, so generating them unconditionally for both a sum type and all its cases would lead to code bloat and code duplication.
If we want to generate derivation infrastructure unconditionally, we are forced to have separate infrastructure for sums and products to avoid code duplication.

Goal

So, the ideal API would be something like Miles' erased API, but with usable types and without needing to create extra classes in cases and enum values. The challenge is to come up with something along these lines.

plokhotnyuk commented 5 years ago

Would it be possible to derive type classes for any types (not only sum or product of case classes/objects)?

Can this API be used for recursive derivation of parsing and serializing codecs like it is possible now with Scala 2.12 macros here?

odersky commented 5 years ago

I believe deriving type classes for other types is not in the scope of the current work. Recursion is fine, it is already supported in the examples we give.

jdegoes commented 5 years ago

I suggest taking a look at scalaz-deriving, which shows how derivation can be accomplished using a minimal feature set, by having the user merely provide instance for a Deriving type class. Now some details would have to be generalized, but it's very clean, user-friendly (want deriving for your type class? Just implement a type-safe instance of a deriving type class!), and type-safe.

odersky commented 5 years ago

@jdegoes: I did look at scalaz-deriving, as well as shapeless and magnolia. They are all much higher-level than what we try to achieve here. They can be more convenient for users wanting to write a new derivable type class, but they are vastly more complex. Our goal is to have something very low-level and simple that can be used as a substrate on which more elaborate derivation frameworks can be built. In particular, since we are talking about standard compiler generated code, there should be no dependencies on external libraries.

[The comment started earlier with some stuff about problems caused (3) and (4) which was not accurate. I found the real problem now. It has to do with the difference between enums and case classes, namely what's the type of a case apply.]

odersky commented 5 years ago

run/typeclass-derivation2c.scala in #6218 is a worked out strawman that implements suggestions (1) - (4). In particular:

The framework is typed (somewhat - implementers of derivable typeclasses still need to know what they do, but if they don't they should not be in the business of writing typeclasses!)
Labels are compiletime only.
Generics for product types and singleton types do not need separate classes. They re-use the companion object as their implementation. Generics for sum types do need a separate class, but that's less of a problem.
Sums and products have separate infrastructure.

Given this strawman, I believe the following is a feasible generation strategy:

We always generate generic infrastructure for singletons and products. This covers case classes, case objects, and enum cases. The bytecode price to pay for this is very reasonable: a trivial method to get the generic instance and in the case of product types a fromProduct method, which is also quite small.
We always generate generic infrastructure for enums.
We generate generic infrastructure for sealed classes and traits with case class/object children if a derives clause is present.
An implicit instance of Generic[T] can still be obtained for other sealed classes and traits but this entails the creation of an anonymous class at the call site, so leads to larger code.

The reason why we should not generate generic infrastructure for all sealed classes and traits is two-fold:

It's sometimes impossible, for instance if there are children in deeply nested scopes, or children are defined in their separate objects. In that case we might get an error saying that children were added after they were computed. We do not want to rule these patterns out a priori, just because we cannot construct the generic infrastructure for them.
It might lead to the generation of code that's unused.

OlivierBlanvillain commented 5 years ago

I think we can close this issue now that the typeclass derivation infrastructure has stabilized.

scala / scala3