healeycodes / nodots-lang

š§š A small programming language with an interpreter, and a WebAssembly compiler.
https://healeycodes.com/profiling-and-optimizing-an-interpreter
12 stars 1 forks source link

Suggestions #1

Open ysharplanguage opened 1 year ago

ysharplanguage commented 1 year ago

Hi there, I enjoyed reading Profiling and Optimizing an Interpreter: great points all the way!

Just thought I might share about a recent experiment of mine, wrt. AST-walking interpreters, that I dubbed "full linearization". You might want to give a shot at the implementation technique.

If that worked so nicely for C# as the implementation language, who knows, this might benefit Python, too.

Full linearization in LinearLisp

In a nutshell:

You collapse the whole AST in just a single big flat 1-dimension array, that your eval / apply evaluation core runs a cursor over.

All references to objects on the heap (formerly, nodes and children lists, attributes, etc) are now gone. The whole AST is now just an array of integers. Great for cache locality, and turns out, it indeed pays off...

Shootout via answer on SO

Cheers!

healeycodes commented 1 year ago

@ysharplanguage thank you for the idea/resource! I was looking for more examples of this kind of thing but couldnā€™t find anything at the time.

ysharplanguage commented 1 year ago

@healeycodes Welcome!

Linearization in LinearLisp...

( n + 1 )

becomes:

12, 3, 4, 5, 6, -1, 0, 18, 0, -15, 0, -20

From left to right:

12 gives the total subtree length to come;

3 is its arity (ie, # of children: "n", "+", "1")

then 4, 5, 6 are the relative offsets to each child, from left to right: 4 points to "0, 18" that ensues... ie, the "n" symbol in the current environment (entry in symbol table)

5 points to "0, -15" later on... the builtin "+" operator (note the negative sign for builtins vs programmer defined identifiers)

6 points to "0, -20" finally... that'd be, the literal constant "1", interned once in the constant pool (also a negative index by convention; you can just shove both interned literals and language builtins in the same pool)

the -1 in the middle is a redundant marker to end the children offset list... only useful for builtins with variable arity that may not want to have to bother inspecting the count of formal vs actual arguments or children.

Thus, that's how one can get away with AST walk without any heap or garbage collector in the loop (when it's for the sole purpose of tree walk, anyway) and still be able to eval anything just by using the host (impl language) natural stack.

Cf the Linearize and Rehydrate methods in the code... together they are the two-way bijection to go from / to the original S-expr (initial AST, right after parsing) vs the linearized version (which the evaluator eventually only consumes as just one 1-dim array super friendly to modern CPUs caches :)

This encodes all the needed tree info yet allows for fast navigation throughout the eval / apply semantic core, without ever having the GC getting in the way by its obsessive tracking of refs that are dead vs alive, etc.

Cheers!

ysharplanguage commented 1 year ago

@healeycodes I've added a few microbenchmarks, incl. one that is always especially important to keep an eye on, as an interpreter implementor:

For-Loop-vs-Lexical-Scope-Stress.py

efficient AST walk or efficient byte code is only half of the struggle against a sluggish implementation, imo...

the real trouble makers in interpreted dynamic languages are dynamic environments, lexical scope, combined with closure representation; those may be the second (if not first) hardest thing to get both right and efficient, in my experience.

healeycodes commented 1 year ago

@ysharplanguage Amazing! Thanks for expanding your thoughts here. This is really helpful.