sebastienros / esprima-dotnet

Esprima .NET (BSD license) is a .NET port of the esprima.org project. It is a standard-compliant ECMAScript parser (also popularly known as JavaScript).
BSD 3-Clause "New" or "Revised" License
425 stars 75 forks source link

Add generic AstVisitor #370

Open GerardSmit opened 1 year ago

GerardSmit commented 1 year ago

This PR adds a generic AstVisitor (AstVisitor<T>).

This allows value types to be returned without boxing, for example a double to make a simple calculator:

https://github.com/sebastienros/esprima-dotnet/blob/977356b3aee00ba5d884d2a9eec8c66635bf9ee2/test/Esprima.Tests/AstVisitorTests.cs#L105-L141

AstVisitor extends AstVisitor<object?> so it's still possible to use inherit AstVisitor without making a breaking change.

adams85 commented 1 year ago

Although I've only got to take a quick look at the proposal yet, I have doubts because making virtual methods generic doesn't usually come without a cost:

Esprima.Benchmark.VisitorBenchmark

Diff Method Mean Error Allocated
Old (.NET 6.0) Visit 10.34 ms 4.592 ms 16 B
New (.NET 6.0) 13.76 ms (+33%) 1.815 ms 16 B (0%)
Old (.NET Framework 4.7.2) Visit 10.00 ms 2.900 ms 0 B
New (.NET Framework 4.7.2) 13.76 ms (+38%) 3.110 ms 0 B
Old (.NET 6.0) Rewrite 20.17 ms 5.604 ms 33 B
New (.NET 6.0) 24.44 ms (+21%) 7.346 ms 33 B (0%)
Old (.NET Framework 4.7.2) Rewrite 27.27 ms 4.751 ms 0 B
New (.NET Framework 4.7.2) 30.70 ms (+13%) 1.566 ms 0 B

I like the idea in general but don't like the idea of taking the perf. hit in cases where reference return values are enough (or returns are not needed at all). And to me, this seems the (much?) more frequent case. I can't really see too many use cases which need value type returns. But maybe it's just my ignorance as e.g. Roslyn probably doesn't provide a non-generic and a generic visitor implementation with parallel non-generic and generic Accept methods by accident.

Such a parallel solution would solve the perf. issue but would add to our maintenance burden significantly. Thinking of which, I don't really like this idea either...

Adding a generic Accept without the corresponding generic visitors might be something to consider though. This way who really need value type returns could implement the generic visitor for themselves.

But it's also worth noting that there may be other viable ways to handle this problem, for example object pooling.

These are my first thoughts. @lahma @jogibear9988 How about you?

GerardSmit commented 1 year ago

Huh, I didn't know that it would have this much of an impact on the performance. Good catch. I always thought that a generic method would "magically" create a new method with the generic types and having the same(ish) result.

I tried to change to make the final class sealed, and change the abstract class to an interface but both had the same performance hit.

I agree that the performance hit is too much for this to be merged, and having to maintain two methods (on every node) for both visitors isn't ideal either.

Alternatively we could create a source generator which

  1. Creates a AstVisitor<T> class based on the AstVisitor class.
  2. Adds T Accept<T>(AstVisitor<T> visitor) to every node.

But this is also more code to maintain which brings the question if it's worth it.

sebastienros commented 1 year ago

What if there is a generic version and a non generic one that inherits from the Visitor<object> ? Would be non-breaking, but would the perf of the object one be the same?

sebastienros commented 1 year ago

Next time I will read the PR before commenting

lahma commented 1 year ago

Thank you @adams85 for reviewing and checking the perf impact! I think we need to be very careful in this regard as a lot of the esprima 3 (vnext) effort has been around performance (and of course features and fixes). I'd say @adams85 is the best person to give feedback on design as he's been the driving for around AST visitors (thank you for that!).

I'm trying to understand the use case here. If it's about crafting an interpreter, maybe that should be a separate part - just like Jint is built on top of the esprima AST and does the JavaScript interpretation. So in that way there's no need to "transform" return values.

jogibear9988 commented 1 year ago

for me the perf hit does not matter, cause I only use esprima.net to parse and rewrite javascript files once during application startup. maybe we could generate a copy of the Visitor, a generic one and a non generic one? Maybe the copy could be generated via Source Generators? Maybe add an Additional AcceptGeneric Method to each node?

adams85 commented 1 year ago

Thank you all, guys, for sharing your opinions. They certainly helped my thought process.

I'm trying to understand the use case here. If it's about crafting an interpreter, maybe that should be a separate part - just like Jint is built on top of the esprima AST and does the JavaScript interpretation. So in that way there's no need to "transform" return values.

I can imagine a use case where you just want to evaluate some simple JS expressions (maybe only a very limited subset of what the language allows). I did something similar in my PO parser library. (Though that one doesn't evaluate the expression but transforms it into a compiled lambda expression.) Anyway, the point is that such simple transformations wouldn't really require more than a visitor. And when you evaluate e.g. numeric expressions, it'd be nice to be able to return a value type to avoid boxing.

Alternatively we could create a source generator [...]

maybe we could generate a copy of the Visitor, a generic one and a non generic one? Maybe the copy could be generated via Source Generators? Maybe add an Additional AcceptGeneric Method to each node?

This is a nice idea, definitely worth exploring! I haven't had the time to delve into it but I suspect that a bunch of visitor-related boilerplate code could be generated. I think of stuff like

There are a few special cases here and there but the whole thing follows a pretty solid pattern. We'd need to annotate our AST nodes to let the generator know which are the child properties, what is the default order of their visitation and probably a few other pieces of relevant information and we could generate about 90% of what I enumerated above. Handcrafting the remaining bits wouldn't be a big deal then.

In that case adding a generic version of Accept and visitors shouldn't be too much work either. As a matter of fact, this is the only viable way of adding this to the library which I can see now as I came to the conclusion that we'd better not take the performance hit. Which means the non-generic version must be kept as is, that is, the generic one could only be done separately.

The main problem is that implementing such a generator is quite a task... But I'm sure it'd be pretty damn exciting as well. So I may be not able to resist to give it a try. :) Of course, if anyone else has intentions like this, just let me know.

sebastienros commented 1 year ago

@adams85 NB: I am also learning that you created a PO file parser (we did too in Orchard Core), and finally that you should start using Parlot.

adams85 commented 1 year ago

I am also learning that you created a PO file parser (we did too in Orchard Core)

I know, took the idea of using PO for localization from Orchard (from the "original", not the Core version). 😄

you should start using Parlot.

Thanks for the tip. Looks like a nice lib, I'll take a closer look!

adams85 commented 1 year ago

@GerardSmit

Alternatively we could create a source generator which

  1. Creates a AstVisitor class based on the AstVisitor class.
  2. Adds T Accept(AstVisitor visitor) to every node.

FYI, this is possible now: https://github.com/sebastienros/esprima-dotnet/pull/372

Do you feel like rewriting your PR based on this maybe?