Revive or replace @specialized?

odersky commented 2 years ago

Specialization was rather messy in Scala 2. It had a complicated implementation and hard to understand restrictions and inefficiencies for the user. Nevertheless, when used right, specialization can gain significant performance improvements, which are still unavailable in Scala 3. I am opening this issue to start a discussion what one could do to achieve the essential benefits of specialization in Scala 3, while hopefully avoiding some of its shortcomings.

Pain Points of Scala 2 Specialization

Here are some point points of Scala 2 specialization as far as I can remember them. It might be slightly off or incomplete. Please fill in details that are missing.

Irregular class inheritance. When extending a specialized class, we really need multiple inheritance. For instance, say class A is specialized, class B extends A, then we have the following extension relationship between A, B and specialized versions A_Int, B_Int:
```
     B
   /   \
 A     B_Int
   \   /
   A_Int
```
Since multiple inheritance of classes is not supported by the JVM, specialization will in this case drop the edge from A_Int to B_Int, meaning that some natural specializations are lost.
Duplicate class fields. If A[T] contains a field x: T, then A_Int needs to contain a field x_Int: Int. Since A_Int is a subclass of A, we end up with two sets of fields, one of them unused.
Possible code explosion. Since specialized variants of classes and methods have to be compiled ahead of time, we can get many variants to compile: 9 for a single specialized parameter, 81 for two parameters, and so on. This quickly gets out of hand, leading to code bloat and long compile times. As a mitigation, we can restrict the set of specialized types in the @specialized annotation. There's also the experimental miniboxing work where instead of 9 specialization targets we only have 3 (Long, Double, and Object). The main problem with miniboxing is that without further complications is gets inefficient for defining arrays.
Complicated implementation, to a large degree forced by the complications described above.

Elements of a Redesign

Given the problems with 1. and 2. above I think we should avoid specialized classes. We could instead experiment with some combination of the following ideas.

Inline Traits

An inline trait is defined like a normal trait with an inline modifier.

inline trait MatrixLib[A]:
  opaque type Matrix = ...
  extension (x: Matrix)
     def * (y: Matrix): Matrix = ...
     ...

An inline trait is expanded when it is inherited by an object or class (not by another trait). Example:

object FloatMatrices extends MatrixLib[Float]   // MatrixLib inlined and specialized to `Float` here.

An inline trait itself translates to a pure interface. All its contents are inlined each time the trait is extended by a class or object. Inline traits avoid all of the pain points of Scala 2 specialization. They can do more than primitive specialization since they also specialize on value parameters and reference types. This helps avoid megamorphic dispatch. Inline traits also profit from all the optimizations available for inline methods, including inline matches, summonFrom, embedded splices. Indeed the analogy of inline traits and inline methods is strong: inline calls correspond to supercalls in extending classes and objects, inline parameters are inline parameters of the trait, inline traits can have nested inline matches, etc. Inline traits should not be too hard to implement, since they can probably draw on much of the implementation for method inlining.

Inline trait expansions are only generated on demand when a class or object extends an inline trait. This avoids the up-front cost of creating specialized copies which might never be needed. On the other hand, one needs to be more careful about de-duplication. If an inline trait is expanded with the same arguments twice, we get two expansions.

I see inline traits primarily as building blocks for large generic, specializable libraries. Typically, an application would instantiate such a library only once or a small number of times.

Compared to specialization, inline traits have one shortcoming, namely that interfaces are not specialized. If some code is parameterized by MatrixLib[T] for type parameter T that code will use unspecialized interfaces (which are implemented via bridge methods using standard erasure). Unspecialized interfaces might impose a boxing/unboxing overhead. This leads us to consider also the following extensions.

Specialized Traits

We could do Scala 2 style specialization for traits only. This avoids pain points (1) and (2) and could simplify the implementation, in particular since we could probably make use of the inlining improvements. We can also combine specialized and inline for traits. An inline trait with specialized parameters translates to the specialization of a pure interface. This means a group of interfaces with bridge methods, which is relatively lightweight. So making specialized traits inline can reduce the up-front code generation overhead.

Specialized Methods

Say we have an inline trait like MatrixLib. Consider a factory method for MatrixLib like this:

object MatrixLib:
  def apply[T](...): MatrixLib[T] = new MatrixLib[T](...) {}

Unfortunately, that would instantiate MatrixLib at generic T, without any gain in customization (and without any additional code overhead either). We can get better customization by making the apply an inline method:

object MatrixLib:
  inline def apply[T](...): MatrixLib[T] = new MatrixLib[T](...) {}

Now, we get a new, specialized copy of MatrixLib at each call of apply. Whether this is good or bad depends on how many call sites there are. But if we had specialized methods, we could also do this:

object MatrixLib:
  def apply[@specialized(Int, Float, Double) T](...): MatrixLib[T] = new MatrixLib[T](...) {}

This will produce three copies of apply for matrix libraries over Ints, Floats or Doubles.

Summary

Inline traits cover some of the use cases of Scala 2 specialization with completely different tradeoffs, and potentially more opportunities for optimization and customization. They can be complemented with specialized traits and methods which each addresses one issue of inline traits:

specialized traits mean that efficiency is not lost at interfaces,
specialized methods mean that we can tailor the set of expansions of inline traits to support a specific set of primitive types, without risk of duplicate expansions, by using a specialized factory method.

odersky commented 2 years ago

After discussion with @sjrd @nicolasstucki and @mbovel the sentiment was that we want to do the bare minimum to keep things simple and understandable. In particular, maybe @specialized methods are not needed; since we can just write out the concrete instantiation objects ourselves. E.g. instead of the specialized apply method:

def apply[@specialized(Int, Float, Double) T](...): MatrixLib[T] = new MatrixLib[T](...) {}

just write

object IntMatrices extends MatrixLib[Int] 
object FloatMatrices extends MatrixLib[Float] 
object DoubleMatrices extends MatrixLib[Double]

Sure, it's code duplication, but it makes it easier to see where the code is generated.

On specialized traits, I thought of a simplification that might be worthwhile to look into. Consider a specialized trait like

inline trait Ref[@specialized A]:
  def set(x: A): Unit
  def get(): A

That specialized trait generates 10 subtraits

trait Ref_Byte extends Ref[Byte]
trait Ref_Short extends Ref[Short]
...
trait Ref_Double extends Ref[Double]
trait Ref_Object extends Ref[Object]

So far, so good. But each of these subtraits as well as the original trait Ref itself gets 10 versions of each defined method, for an overall 100 fold increase in size!

We can reduce this code bloat by requiring that a specialized inline trait is only instantiated at the supported subtypes. So an extension of Ref above could instantiate A to any primitive type or (any subtype of) Object, but not to Any or to a completely unconstrained type variable. Or, if MatrixLib was defined like this

inline trait MatrixLib[@specialized(Int, Float, Double) A]

then the only legal extensions of MatrixLib are those that instantiate A to Int, Float, or Double.

This means that every extension of a specialized inline trait T[X] could be translated to extend one of its specialized subtraits T_X. Therefore erasure is allowed to map every occurrence of a type such as T[X] where X is one of the possible specialization types of T to T_X, and insert casts between T and T_X as needed.

This is a big win since the new specialized traits can be defined very straightforwardly. For instance, here is Ref_Int:

trait Ref_Int extends Ref[Int]:
  override def set(x: Int): Unit
  override def get(): Int

There is no 10 fold explosion of generated methods. Erasure as it is will already do the rest: rebind method calls to the more efficient unboxed version and insert bridge methods as needed.

So with this new scheme, we get three benefits:

Drastic reduction of the number of generated methods
Re-use of existing functionality in erasure
Prevent "useless" instantiations that generate copies without gain.

smarter commented 2 years ago

So far, so good. But each of these subtraits as well as the original trait Ref itself gets 10 versions of each defined method, for an overall 100 fold increase in size!

I'm not sure I follow, why would each trait get 10 versions of each method if A is instantiated to one specific type?

odersky commented 2 years ago

@smarter You are right. The base version A gets 10 versions of each method, but each subtrait overrides only two of these and inherits the rest. It's still a lot of code, and code that's specific to specialization (we can't use a pre-existing scheme to generate it).

odersky commented 2 years ago

Here's a refinement on specialization. Instead of an annotation @specialized, define a type class trait

trait Specialized[T] extends ClassTag[T]

Specialized instances are ClassTags that have a field tag: Int with values 0 for Object and 1 to 9 for the primitive types. Specialized instances are synthesized for all reference types and all primitive types (status of value classes is not yet clear).

Inline traits with Specialized type parameters generate sub-interfaces as described in https://github.com/lampepfl/dotty/issues/15532#issuecomment-1167601366.

An object or class can then extend a specializable trait only with type arguments that are Specialized. Either the type argument is a reference type or a primitive type, in which case we know all we need to create the inline trait instance. Or it is itself a type parameter X with Specialized context bound $ev. In that case we only permit creation of an anonymous class instance. To translate that instance, we generate a switch like this:

($ev.tag match
   case 0 => new T[Object] {}
   case 1 => new T[Byte] {}
   ...
   case 9 => new T[Unit] {}
).asInstanceOf[T[X]]

Here's a little implementation of a specialized array buffer using that technique. Note that the same functionality cannot be achieved with Scala 2 specialization, because we cannot specialize on Object there.

inline trait Buffer[T: Specialized] private ():
  private var elems: Array[T] = new Array[T](16)
  private var len = 0

  def += (x: T): this.type =
    if len == elems.size then
      val newElems = new Array[T](len * 2)
      elems.copyToArray(newElems)
      elems = newElems
    elems(len) = x
    len += 1
    this

  def ++= (xs: T*): this.type =
    xs.foldLeft(this)(_ += _)

  def head: T =
    assert(len > 0)
    elems(0)

  def iterator: Iterator[T] =
    elems.iterator.take(len)

  def foreach(op: T => Unit): Unit =
    for i <- 0 until len do op(elems(i))

  def size: Int = len

object Buffer:
  def apply[T: Specialized](xs: T*): Buffer[T] = 
    (new Buffer[T] {}).++=(xs*)

odersky commented 2 years ago

A refinement to optionally allow only a subset of specialization types is left for future work.

odersky commented 2 years ago

Alternative to generalize the "split-by-specialized-tag" technique: Have an apply method in object Specialized that is handled specially by the compiler.

object Specialized:
  def apply[T: Specialized, F[_])(x: => F[T]): F[T]

with

Specialized[T, F](body)($ev) 
  --->
    ($ev.tag match
      case 0 => [T := Object](body)
      case 1 => [T := Byte](body)
      ...
      case 9 => [T := Unit](body)
    ).asInstanceOf[F[T]]

In that case the builder method of Buffer would look like this:

def apply[T: Specialized](xs: T*): Buffer[T] = 
   Specialized(new Buffer[T] {}).++=(xs*)

odersky commented 2 years ago

To summarize: The new design has three elements

Inline traits, which allow to customize large parts of object constructions,
a Specialized type class trait, which transports knowledge about primitive vs reference types,
a Specialized(...) application, which splits a generic code block according to a specialized type parameter.

inline traits and Specialized interact in that we know we can create specialized subtraits T_X for inline traits and have erasure cast from T[X] to T_X.

odersky commented 2 years ago

Brainstorming session result of how to inline Numeric ops:

inline trait Numeric[T]:
  inline def plus(x: T, y: T): T

object NumericInt extends Numeric_Int:
  inline def plus(x: Int, y: Int): Int = x + y

inline trait A[T]:
  transparent inline given n: Numeric[T] = summonInline
  def f(x: T): T = n.plus(x, x)

class B extends A[Int]:
  type T = Int                            // generated
  transparent inline given n: Numeric[T] = NumericInt  // generated
  override def f(x: T): T = n.plus(x, x)       // generated

nicolasstucki commented 2 years ago

More detailed version

inline trait Numeric[T: Specialized]:
  inline def plus(x: T, y: T): T
  def times(x: T, y: T): T

// generated
trait Numeric_Int extends Numeric[Int]/*removed*/:
  inline def plus(x: Int, y: Int): Int
  def times(x: Int, y: Int): Int

object NumericInt extends Numeric[Int]/*erases to Numeric_Int*/:
  inline def plus(x: Int, y: Int): Int = x + y
  def times(x: Int, y: Int): Int = x * y

inline trait A[T]:
  transparent inline given n: Numeric[T] = summonInline
  def f(x: T): T = n.times(n.plus(x, x), x)

class B extends A[Int]/*removed*/::
  type T = Int                            // generated
  transparent inline given n: Numeric[T] = NumericInt  // generated
  override def f(x: T): T = 
     n.times(n.plus(x, x), x)       // generated
     // then inlined to
     // NumericInt.times(x + x, x)

odersky commented 1 year ago

An alternative, more expressive and convenient way to do specialization with inline traits is described in https://github.com/lampepfl/dotty/discussions/18044

scala / scala3