scalameta / scalagen

WIP - Scalameta powered code generation
Apache License 2.0
40 stars 5 forks source link

Experiment: Use Tagged Tree's #24

Open DavidDudson opened 6 years ago

DavidDudson commented 6 years ago

Instead of traversing scalameta tree's, lets create our own. Based at the Member/Owner level instead of as fine grained as scalameta.

This is still a fairly rough concept, but I think it's a far better approach then the current, and will scale.

case class MemberTree(name: FullyQualifiedName, ctx: GenContext, children: List[MemberTree])

The GenContext could contain information such as the original untouched scalameta tree, the current state of the scalameta tree, any semantic information, parameters etc.

With this apprach we should be able to prevent any mutable state inside the generator itself (Like the existing state necessary for transmutations)

This will also allow full control over traversal order.

olafurpg commented 6 years ago

We use a similar approach in scalafix with Patch and it's been working quite great, it helps composability a lot. I've actually wondered if scalagen can build on top of that pipeline by adding higher level combinators common for codegen. One benefit of this is approach is that it may be easier to do format/trivia preserving transforms.

DavidDudson commented 6 years ago

I've had a look at scalafix and it seems scalagen & scalafix do similar, but different things at present.

Basically, it comes down to a few things.

Source vs Intermediate vs Compiler Trees:

Scalafix is very much rewriting source code. This is where problems with retaining Trivia etc. come into it. I would rewriting to source is one of the strongest restriction

Macros write to an intermediate step in the compiler. The outputed code is almost never seen by the macro user. It does not have to be readible, or maintainable. In fact, we can happily remove all trivia and formatting.

Scalagen sits in the middle. The primary use case at present is replacing annotation macros, which means our target is an intermediate stage between source and compiler. This code may be seen by the user, via jump to definition etc. Thus it has to retain some sane formatting.

It is reasonable to think that in the future, scalagen could be used to convert SQL tables into series of case classes etc. Or generate Antlr visitors from .g4 files. We should not discount the need to output formatted code or code with trivia. However, I do not believe we need to worry about formatting/trivia this intermediate step. It is not source code.

Targeted vs Generic:

Scalafix is a generic operation. It takes an entire tree and replaces matching clauses. This is also why it is so good for linting etc. It is up to the author of the rewrite to choose which tree's get manipulated, not the user.

Macro annotations are targeted. The user has to specify exactly which tree's are targeted.

Scalagen however, is primarily targeted. Currently by annotations, possibly by some other, additional means in the future.

Semantic vs Partial vs Syntactic:

Scalafix is semantic. Given the nature of rewrites, you want to be as accurate as possible with your results. If the code compiled before a rewrite is applied, it should compile after a rewrite has been applied. This is what makes scalafix so special when it comes to migration.

Macro Annotations are technically semantic. Scalameta/paradise annotations were purely syntactic.

I believe scalagen should be partially semantic. Meaning any information we can have without compiling should be accessible. Fully Qualified names etc. We may want to add a separate flag to have full semantic information avaliable should the need arise, however, most generators will be purely syntactic, we do not want to compile twice unless absolutely necessary.

Blackbox vs Whitebox:

Scalafix has Whitebox analysis but works in a blackbox scenario. (T => T)

Macro Annotations (paradise) are blackbox ignoring companions Macro Annotations (scalamacros) can analyze in either whitebox or blackbox but cannot modify their siblings (ignoring companions).

Scalagen is more whitebox then all of these, the Transmutation generator allows generation of new siblings, but prevents manipulation of existing siblings.

Token vs Structure

Scalafix works at the Token level

Macro Annotations work at the structural level. (scalac tree's)

Scalagen works at the structural level (scalameta trees)

I do not think scalagen should work at the token level. However, I see the need to possible reformat the output, or attach comments to generated definitions, all the while preserving existing comments. This is primarily only for source code generation, for tools like Antlr.

Given all of these differences, I wonder if scalafix does too much for scalagen. There is definitely overlap and it would be nice to deduplicate some of the common behavior. In fact, you could argue that Manipulation/Extension Generators are just targeted high level scalafix rules. However, I would like to continue separated from scalafix for now. Provided the surface API remains the same we can always replace the Runner with a scalafix Rule/Patch based system if necessary.

olafurpg commented 6 years ago

However, I do not believe we need to worry about formatting/trivia this intermediate step. It is not source code.

It's not a requirement to preserve trivia with scalafix, you can use tree transforms with replaceTree(tree, tree.transform { ... }.syntax) if you want.

Macro annotations are targeted. The user has to specify exactly which tree's are targeted.

My idea was that a scalafix rule can figure out which generator to run, the generator author should not have to find the tree nodes to expand.

Scalafix is semantic

Scalafix has first-class support for both semantic and syntactic rules. Syntactic rules are implemented with object MyRule extends Rule(.... This actually turned out to be quite tricky to support in combination with dynamic rule classloading, but it works pretty great.

Scalafix has Whitebox analysis but works in a blackbox scenario. (T => T)

I'm not sure I follow, scalafix rules are free to emit pretty much any output. Scalafix rules can even generate non-Scala code if they like, the cli even supports custom mappings to put the output in a different file with different file extensions (see --out-from and --out-to in https://scalacenter.github.io/scalafix/docs/users/installation#help)

Scalafix works at the Token level

Scalafix can work on the tree level if you don't care about preserving formatting trivia.

I do not think scalagen should work at the token level.

I agree!

However, I would like to continue separated from scalafix for now.

I totally agree! I think there might be potential reuse with scalafix-testkit, which gives quite a nice workflow for developing rules. Just wanted to mention it's worth keeping in mind that the two tools are similar on many levels and maybe some work/ideas can be reused.