onflow / cadence

Cadence, the resource-oriented smart contract programming language 🏃‍♂️
https://cadence-lang.org
Apache License 2.0
533 stars 138 forks source link

Decouple AST and related code from parser #2549

Open turbolent opened 1 year ago

turbolent commented 1 year ago

Issue to be solved

The parser currently depends on the AST. The AST package not only contains AST element types, but also several other functionality, like JSON serialization and code formatting, and their dependencies (JSON and pretty printing modules).

For some use-cases, a parse of a full program into an AST is not needed. For example, the parser may only be used to check for syntactical validity, so allocations of AST nodes is not necessary at all.

Other use-cases may only be interested in the structure of parts of the program. For example, SDKs and the Playground may only be interested in parameter lists of publicly accessible functions.

For such use-cases, the full parser might be too large. For example, currently compiling the parser to WebAssembly results in a binary that is several MB large.

Suggested Solution

Move the AST-related code, the construction of AST elements, out of the parser.

Replace the current AST element constructor calls with "generic" function calls, which can be implemented in various ways, depending on need. The default implementation will continue to produce a full AST.

We might need to reduce or even eliminate the performance overhead of this abstraction. There are several option to do so:

bluesign commented 1 year ago

Generics are a bit problematic for JS/wasm outputs with tinygo, goprherjs etc. Interface seems most elegant solution, if it doesn't bring too much overhead.

For JS output my experiments with leanest possible parser and ast, with no imports other than "utf8" ( no prettier, fmt, strings etc )

03% common
25% ast
06% lexer
40% parser 

Rest is JS overhead with builtin go stuff ( runtime ) , slice, array, copy etc

I believe, if possible, we can do a less forgiving grammar for stable cadence and remove ambiguity cases. This would also make parser a lot of leaner.