Should we bring back rewrite methods?

lampepfl / dotty-feature-requests

Historical feature requests. Please create new feature requests at https://github.com/lampepfl/dotty/discussions/new?category=feature-requests

31 stars 2 forks source link

Should we bring back rewrite methods? #56

Open odersky opened 5 years ago

odersky commented 5 years ago

At the SIP meeting several people felt that rewrite methods were a rather natural way to express typelevel programming without having to breakout to the meta level. We also identified two areas where we don't currently have a solution and rewrite methods look like they could fit the bill:

Typelevel stuff where synthesized terms are dependent on implicit search. Without a mechanism like implicit matches, we are forced to stay in logic programming mode forever. I can see a specific use case in typeclass derivation.
String interpolators in patterns. We need a way here to derive an expected type for an embedded pattern from the contents of a string. This needs a form of whiteboxity. Rewrite methods could do it in principle, whether at acceptable complexity remains to be seen.

We removed rewrite methods in #5109, #5138 since the specification and implementation problems seemed too hard. In particular:

it was very tricky to produce untyped trees with typed splices,
we had to pickle untyped as well as typed trees,
the scheme to package implicits to come with untyped trees was quite complex,
using rewritten terms to implicitly define types was a performance bottleneck once the defined types got large (i.e. 500+ elements).
the fact that inlined code could fail in typer was unsettling
the last point also implicitly moves the type checker into the APIs of libraries. Subtle changes in the typechecker might break inlined code that worked with an earlier version.

But maybe we can change the spec and implementation scheme to address these issues? Here's an idea for morphed rewrite methods, which I suggest should continue to carry the inline modifier instead of rewrite, as was the case formerly.

Inline their bodies as typed trees, as is the case for regular inline methods now
The right hand side of an inline method may be one of three special match expressions: inline match, type match, implicit match.
An inline match is evaluated at inlining time, just like we used to handle rewrite matches. The language of patterns available in an inline match might be restricted to make such evaluation predictable. Inline matches must rewrite to an alternative. The rewritten alternative is subject to some substitutions or wrappings that map pattern-bound variables to parts of the scrutinee. This can affect terms as well as types.
The type of an inlined expression is the type of the rewritten body.
Implicit matches and type matches are as described before, except that they again work with typed trees.
We keep match types to define computed types; no need to rely on term inlining to do this.

So, in essence, this is like the previous rewrite methods, except that instead of re-typing trees from scratch at the inlined position, we take the typed body and specialize it. Specialization is by picking an alternative in a match, by substituting parts of the scrutinee for pattern-bound variables, and by substituting actual for formal parameters.

This new version of inline methods is different from the currently implemented one in that the type of an inlined expression is in general a subtype of the inline method's return type. We can accommodate both versions by adopting a syntax element from match types.

inline def f(...): T = ...     // type of inlined expression is always T
inline def g(...) <: U = ...   // type of inlined expression is a subtype of U

Before embarking on fleshing this out, which - even accounting for previous work - will be a large effort, it would be good to discuss whether we need this, and to look at all the possible reasons why we should not do this. So, if you can contribute to the discussion, please do!

jducoeur commented 5 years ago

From the peanut gallery: I can't claim to understand all the implications, but I really like the rewrite method concept -- I find it far more intuitive than the logic-programming approach.

And I love the <: syntax here -- I think it's actually much clearer than the way the proposal worked before. It provides really nice glanceability.

nicktelford commented 5 years ago

If the <: T syntax is only permitted on inline methods, that means there will be no way to ascribe an explicit type declaration to regular methods who's type depends on an inline method; instead, we'd have to rely on inference, which is fine, but explicit type declarations are often desirable for documenting the type, especially on public methods.

Example:

inline def f(...): <: T = ???

def g(...) = f(...) // no way to add explicit return type
def h(...) = g(...) // still no way to add explicit return type
val x = h(...) // no way to add explicit type annotation

Perhaps this is fine, it's just something I wanted to highlight.

odersky commented 5 years ago

def g(...) = f(...) // no way to add explicit return type

I don't see why we can't add an explicit return type to g. We'd have to know what f returns of course, and the type could be something that's computed itself, like a match type.

propensive commented 5 years ago

@odersky Thanks for putting more thought into interpolated patterns after we spoke last week. I use this very frequently in Kaleidoscope to pattern match on Strings with regular expressions, and to extract capturing groups in the regex.

In this particular case, the interpolated values only ever extract to Strings themselves, so the amount of "whiteboxity" necessary is limited to the arity, which should be a much simpler problem to solve than full support for returning precise subtypes.

It would be great to support heterogeneous type extraction, e.g.

xml match { case xml"<$tag param=$value/>" => ... }

but that would require parsing the XML to calculate the types of tag and value, and I don't know how feasible that is once the current power of Scala 2's whitebox macros is replaced with Scala 3's metaprogramming facilities.

odersky commented 5 years ago

@propensive I think these are good use cases. To tackle these problems it would help to get better constant folding for strings. E.g. we could treat any referentially transparent java.lang.String operation on constant arguments as a constant by evaluating it at compile-time. Or something like that. Problem is we have to spec anything we support, it's implicitly part of the language.

propensive commented 5 years ago

Yes, that's more or less the conclusion I was hoping for. :)

odersky commented 5 years ago

We now have an initial implementation of the new specializing inline method scheme: #5392.

Here's a first evaluation:

The new scheme is not quite as expressive as the old rewrite methods, but much simpler. it was possible to get significant functionality without having to re-typecheck untyped code. The one thing that is mostly out of reach are recursive implementations of generic typelevel functions like concat or toNat in docs/typelevel.md. The problem here is that these functions involve a step to infer type arguments of a recursive call. These are inferred to be too generic in the inline definition, and the instantiation is not re-done on expansion. We can circumvent the problem using match types, since these can provide types that are already good a priori and therefore don't need a retyping during expansion.
Integration with implicit search should work well once #5405 is fixed.
There's still a lot of uncertainty about what can be done and what cannot. The problem is that we do partial evaluation by small-step rewriting here. The rewriting scheme is extremely limited compared to the full language. Lots of "obvious" simplifications are missing. So one is forced to experiment a lot, trying out different formulations of an inline method to see which one rewrites as requested.
The overlap with staging has become a lot larger. For instance, the implementation of Tuple in #5392 contains a switch that chooses between specialization through inline matches or specialization through staging. The two methods are largely equivalent.

--

odersky commented 5 years ago

In light of point 4 above, it's interesting to do a detailed comparison between staging (i.e. quotes and splices) and partial evaluation (i.e. specializing inline, inline match, inline if).

Staging uses the full language to compute new terms, partial evaluation uses only a subset that is expressed by rewrite rules.
Staging clearly distinguishes code (that rewrites programs) from data (the program parts that get rewritten). Partial evaluation doesn't, but there's a safety net that ensures that every inline if and inline match is rewritten. That gives some feedback if something does not rewrite as expected.
Partial evaluation can specialize types, but staging can't since it runs after typer.
Partial evaluation integrates well with implicit search. It invokes implicit searches that can in turn compute types that guide further rewrites. Staging can invoke implicit search, but, again, that search cannot influence types, so is much less powerful.
Staging requires macro code to be compiled in separate projects. Partial evaluation makes no such requirement.

In summary, staging is ideal for optimizing code and for mapping it to new backends. Partial evaluation is currently the only means available for doing typelevel computations. Partial evaluation feels simpler (when it works) since it works on the language level, not on the meta level, and does not require pre-compilation. Staging feels more robust since it is clear what gets evaluated.

odersky commented 5 years ago

But maybe it is possible to combine the advantages of partial evaluation and staging in a single scheme? Here's an outline of what a solution would have to look like:

Staging needs to be done during typer instead being a separate phase.
Staging needs to be able to specialize types. I.e. a top-level splice of an Expr[T] can yield a term of a subtype S <: T.
Staging would profit from some specialized constructs that mirror the ones used in inlining. In particular there should be a version of implicit match to concisely express an implicit search that may fail and that may bind type variables if it succeeds. Furthermore, it could be interesting to support the analogue of an inline match along the following lines
```
def macroImpl(x: Expr[Any]) = x.lower match {
 case x: Int => ...
 case x: Boolean => ...
}
```
Here .lower is the dual of lift. It goes from Expr[T] to T without going through splice. What we want to achieve here is moving semantic tests from the meta level to the language level. We are asking whether x is an expression tree of type Int or Bool by using the familiar match construct instead of some meta-level operation such as x.tpe.isSubtypeOf(IntType). This becomes particularly attractive once we need to bind type variables in the process. I am not sure whether lower this is the right way to achieve this (it probably isn't) but the goal should be clear.
We need some way to mitigate the problems of running user code coming from a macro library in typer. This is the biggest obstacle to overcome. There are many issues with running user code in typer:
- security risks
- resource leaks
- difficulties for editors and IDEs to implement this accurately
- need to precompile macros which prevents a natural integration of language and meta-level
- difficulty of implementing macros on platforms other than the JVM

I believe the best way to avoid these problems is to develop an interpreter for Tasty trees. This is what other languages with advanced macros do. It will be a lot of work, but, if completed, such an interpreter can be used in multiple ways:

execution engine for macros
to implement a REPL outside the JVM
an executable spec for TASTY
a starting point for a direct implementation in Truffle
expression evaluator for debugger
a general eval method*

(*) this would require the interpreter to be available also at run-time.

If a full Tasty interpreter is out of reach or too slow, one could also think about mapping transparently to a lower-level format (such as Scala.js IR or Native IR) and writing an interpreter for that. The tricky bit with that is that it requires transforming some code (the macro parts) independently from the rest of the program. Our transformation pipeline is close to supporting this, through the DenotationTransformer and InfoTransformer framework, but there are some parts where a tree transformation affects a visible type (i.e. where enteredAfter or installAfter is invoked). We'd have to compensate for that somehow.

odersky commented 5 years ago

If we want to do this there are many things to research and develop (I'd guess several papers and a PhD thesis or two worth of stuff). In particular:

Tasty interpreter
development of language constructs that allow expression of meta-level operations at the language level (e.g. analogues of inline match, implicit match, maybe also typeOf)
integration of staging and typer
an extended notion of constant value. Staging would like to deal with case class instances, sequences, arrays, as values, not as trees. I.e they should be eligible as inline parameters.
quotes for patterns to make rewrite rules easier to write
high-level APIs
IDE integration

This will require a lot of effort and some time. And the outcome is not a given, so it's a genuine research problem.

So, how can we combine this with the tight schedule for a Scala-3 release? I believe we have to hedge our bets.

To make progress in typeclass derivation, source inspection macros, and tree inspection macros we should build on what we have: Inlining/implicit matches for typeclass derivation and other typelevel operations, and Tasty reflection for the rest. We should concentrate on getting a meaningful community build up and running using these primitives.

At the same time and in parallel we could work towards the sketched unification of partial evaluation and staging. If and when this works out, we can switch over to the new scheme. If that step happens before the 3.0 release date, great! If it comes later, we can ship with what we have now, and switch over to the new scheme in 3.1 or 3.2. This would mean that the machinery of partial evaluation would have to come under an "experimental, unstable" language flag in 3.0, since it would be phased out again later. Staging is less affected by this, since its capabilities would be mostly increased by the unification (the only caveat is that the interpreter might restrict the macro programs that can be executed at compile-time).

soronpo commented 5 years ago

To make progress in typeclass derivation, source inspection macros, and tree inspection macros we should build on what we have: Inlining/implicit matches for typeclass derivation and other typelevel operations, and Tasty reflection for the rest. We should concentrate on getting a meaningful community build up and running using these primitives.

I agree. IMO, there are two "tracks" to be considered for Scala 3:

The "we can upgrade to Scala 3" track. We add any good macro-replacement feature we can so that a large part of the ecosystem can be build with Scala 3.0 when it's official.
The "we want to upgrade to Scala 3" track. These are the cool features that make us want to migrate to Scala. The second track changes its features from lessons learned during the first track.