Open odersky opened 2 years ago
After discussion with @sjrd @nicolasstucki and @mbovel the sentiment was that we want to do the bare minimum to keep things simple and understandable. In particular, maybe @specialized
methods are not needed; since we can just write out the concrete instantiation objects ourselves. E.g. instead of the specialized apply method:
def apply[@specialized(Int, Float, Double) T](...): MatrixLib[T] = new MatrixLib[T](...) {}
just write
object IntMatrices extends MatrixLib[Int]
object FloatMatrices extends MatrixLib[Float]
object DoubleMatrices extends MatrixLib[Double]
Sure, it's code duplication, but it makes it easier to see where the code is generated.
On specialized traits, I thought of a simplification that might be worthwhile to look into. Consider a specialized trait like
inline trait Ref[@specialized A]:
def set(x: A): Unit
def get(): A
That specialized trait generates 10 subtraits
trait Ref_Byte extends Ref[Byte]
trait Ref_Short extends Ref[Short]
...
trait Ref_Double extends Ref[Double]
trait Ref_Object extends Ref[Object]
So far, so good. But each of these subtraits as well as the original trait Ref
itself gets 10 versions of each defined method, for an overall 100 fold increase in size!
We can reduce this code bloat by requiring that a specialized inline trait is only instantiated at the supported subtypes. So an extension of Ref
above could instantiate A
to any primitive type or (any subtype of) Object
, but not to Any
or to a completely unconstrained type variable. Or, if MatrixLib
was defined like this
inline trait MatrixLib[@specialized(Int, Float, Double) A]
then the only legal extensions of MatrixLib
are those that instantiate A
to Int
, Float
, or Double
.
This means that every extension of a specialized inline trait T[X]
could be translated to extend one of its specialized subtraits T_X
. Therefore erasure is allowed to map every occurrence of a type such as T[X]
where X
is one of the possible specialization types of T
to T_X
, and insert casts between T
and T_X
as needed.
This is a big win since the new specialized traits can be defined very straightforwardly. For instance, here is Ref_Int
:
trait Ref_Int extends Ref[Int]:
override def set(x: Int): Unit
override def get(): Int
There is no 10 fold explosion of generated methods. Erasure as it is will already do the rest: rebind method calls to the more efficient unboxed version and insert bridge methods as needed.
So with this new scheme, we get three benefits:
So far, so good. But each of these subtraits as well as the original trait Ref itself gets 10 versions of each defined method, for an overall 100 fold increase in size!
I'm not sure I follow, why would each trait get 10 versions of each method if A is instantiated to one specific type?
@smarter You are right. The base version A
gets 10 versions of each method, but each subtrait overrides only two of these and inherits the rest. It's still a lot of code, and code that's specific to specialization (we can't use a pre-existing scheme to generate it).
Here's a refinement on specialization. Instead of an annotation @specialized
, define a type class trait
trait Specialized[T] extends ClassTag[T]
Specialized
instances are ClassTag
s that have a field tag: Int
with values 0
for Object
and 1
to 9
for the primitive types. Specialized
instances are synthesized for all reference types and all primitive types (status of value classes is not yet clear).
Inline traits with Specialized
type parameters generate sub-interfaces as described in https://github.com/lampepfl/dotty/issues/15532#issuecomment-1167601366.
An object or class can then extend a specializable trait only with type arguments that are Specialized
.
Either the type argument is a reference type or a primitive type, in which case we know all we need to create the
inline trait instance. Or it is itself a type parameter X
with Specialized
context bound $ev
. In that case we only
permit creation of an anonymous class instance. To translate that instance, we generate a switch like this:
($ev.tag match
case 0 => new T[Object] {}
case 1 => new T[Byte] {}
...
case 9 => new T[Unit] {}
).asInstanceOf[T[X]]
Here's a little implementation of a specialized array buffer using that technique. Note that the same functionality cannot be achieved with Scala 2 specialization, because we cannot specialize on Object there.
inline trait Buffer[T: Specialized] private ():
private var elems: Array[T] = new Array[T](16)
private var len = 0
def += (x: T): this.type =
if len == elems.size then
val newElems = new Array[T](len * 2)
elems.copyToArray(newElems)
elems = newElems
elems(len) = x
len += 1
this
def ++= (xs: T*): this.type =
xs.foldLeft(this)(_ += _)
def head: T =
assert(len > 0)
elems(0)
def iterator: Iterator[T] =
elems.iterator.take(len)
def foreach(op: T => Unit): Unit =
for i <- 0 until len do op(elems(i))
def size: Int = len
object Buffer:
def apply[T: Specialized](xs: T*): Buffer[T] =
(new Buffer[T] {}).++=(xs*)
A refinement to optionally allow only a subset of specialization types is left for future work.
Alternative to generalize the "split-by-specialized-tag" technique: Have an apply
method in object Specialized
that is
handled specially by the compiler.
object Specialized:
def apply[T: Specialized, F[_])(x: => F[T]): F[T]
with
Specialized[T, F](body)($ev)
--->
($ev.tag match
case 0 => [T := Object](body)
case 1 => [T := Byte](body)
...
case 9 => [T := Unit](body)
).asInstanceOf[F[T]]
In that case the builder method of Buffer
would look like this:
def apply[T: Specialized](xs: T*): Buffer[T] =
Specialized(new Buffer[T] {}).++=(xs*)
To summarize: The new design has three elements
Specialized
type class trait, which transports knowledge about primitive vs reference types,Specialized(...)
application, which splits a generic code block according to a specialized type parameter.inline traits and Specialized
interact in that we know we can create specialized subtraits T_X
for inline traits and have erasure cast from T[X]
to T_X
.
Brainstorming session result of how to inline Numeric ops:
inline trait Numeric[T]:
inline def plus(x: T, y: T): T
object NumericInt extends Numeric_Int:
inline def plus(x: Int, y: Int): Int = x + y
inline trait A[T]:
transparent inline given n: Numeric[T] = summonInline
def f(x: T): T = n.plus(x, x)
class B extends A[Int]:
type T = Int // generated
transparent inline given n: Numeric[T] = NumericInt // generated
override def f(x: T): T = n.plus(x, x) // generated
More detailed version
inline trait Numeric[T: Specialized]:
inline def plus(x: T, y: T): T
def times(x: T, y: T): T
// generated
trait Numeric_Int extends Numeric[Int]/*removed*/:
inline def plus(x: Int, y: Int): Int
def times(x: Int, y: Int): Int
object NumericInt extends Numeric[Int]/*erases to Numeric_Int*/:
inline def plus(x: Int, y: Int): Int = x + y
def times(x: Int, y: Int): Int = x * y
inline trait A[T]:
transparent inline given n: Numeric[T] = summonInline
def f(x: T): T = n.times(n.plus(x, x), x)
class B extends A[Int]/*removed*/::
type T = Int // generated
transparent inline given n: Numeric[T] = NumericInt // generated
override def f(x: T): T =
n.times(n.plus(x, x), x) // generated
// then inlined to
// NumericInt.times(x + x, x)
An alternative, more expressive and convenient way to do specialization with inline traits is described in https://github.com/lampepfl/dotty/discussions/18044
Specialization was rather messy in Scala 2. It had a complicated implementation and hard to understand restrictions and inefficiencies for the user. Nevertheless, when used right, specialization can gain significant performance improvements, which are still unavailable in Scala 3. I am opening this issue to start a discussion what one could do to achieve the essential benefits of specialization in Scala 3, while hopefully avoiding some of its shortcomings.
Pain Points of Scala 2 Specialization
Here are some point points of Scala 2 specialization as far as I can remember them. It might be slightly off or incomplete. Please fill in details that are missing.
Since multiple inheritance of classes is not supported by the JVM, specialization will in this case drop the edge from A_Int to B_Int, meaning that some natural specializations are lost.
A[T]
contains a fieldx: T
, thenA_Int
needs to contain a fieldx_Int: Int
. SinceA_Int
is a subclass ofA
, we end up with two sets of fields, one of them unused.@specialized
annotation. There's also the experimental miniboxing work where instead of 9 specialization targets we only have 3 (Long, Double, and Object). The main problem with miniboxing is that without further complications is gets inefficient for defining arrays.Elements of a Redesign
Given the problems with 1. and 2. above I think we should avoid specialized classes. We could instead experiment with some combination of the following ideas.
Inline Traits
An inline trait is defined like a normal trait with an
inline
modifier.An inline trait is expanded when it is inherited by an object or class (not by another trait). Example:
An inline trait itself translates to a pure interface. All its contents are inlined each time the trait is extended by a class or object. Inline traits avoid all of the pain points of Scala 2 specialization. They can do more than primitive specialization since they also specialize on value parameters and reference types. This helps avoid megamorphic dispatch. Inline traits also profit from all the optimizations available for inline methods, including inline matches, summonFrom, embedded splices. Indeed the analogy of inline traits and inline methods is strong: inline calls correspond to supercalls in extending classes and objects, inline parameters are inline parameters of the trait, inline traits can have nested inline matches, etc. Inline traits should not be too hard to implement, since they can probably draw on much of the implementation for method inlining.
Inline trait expansions are only generated on demand when a class or object extends an inline trait. This avoids the up-front cost of creating specialized copies which might never be needed. On the other hand, one needs to be more careful about de-duplication. If an inline trait is expanded with the same arguments twice, we get two expansions.
I see inline traits primarily as building blocks for large generic, specializable libraries. Typically, an application would instantiate such a library only once or a small number of times.
Compared to specialization, inline traits have one shortcoming, namely that interfaces are not specialized. If some code is parameterized by
MatrixLib[T]
for type parameterT
that code will use unspecialized interfaces (which are implemented via bridge methods using standard erasure). Unspecialized interfaces might impose a boxing/unboxing overhead. This leads us to consider also the following extensions.Specialized Traits
We could do Scala 2 style specialization for traits only. This avoids pain points (1) and (2) and could simplify the implementation, in particular since we could probably make use of the inlining improvements. We can also combine specialized and inline for traits. An inline trait with specialized parameters translates to the specialization of a pure interface. This means a group of interfaces with bridge methods, which is relatively lightweight. So making specialized traits
inline
can reduce the up-front code generation overhead.Specialized Methods
Say we have an inline trait like
MatrixLib
. Consider a factory method forMatrixLib
like this:Unfortunately, that would instantiate
MatrixLib
at genericT
, without any gain in customization (and without any additional code overhead either). We can get better customization by making theapply
aninline
method:Now, we get a new, specialized copy of
MatrixLib
at each call ofapply
. Whether this is good or bad depends on how many call sites there are. But if we had specialized methods, we could also do this:This will produce three copies of
apply
for matrix libraries overInt
s,Float
s orDouble
s.Summary
Inline traits cover some of the use cases of Scala 2 specialization with completely different tradeoffs, and potentially more opportunities for optimization and customization. They can be complemented with specialized traits and methods which each addresses one issue of inline traits: