tdammers / ginger

A Haskell implementation of the Jinja template language.
MIT License
77 stars 13 forks source link

Track constness / purity for optimizations #23

Open tdammers opened 6 years ago

tdammers commented 6 years ago

One of the optimization steps we perform is compile-time evaluation of subexpression that are completely known at compile time.

E.g., in the template <h1>{{ "hello & goodbye"|capitalize }}</h1>, everything we need to calculate the correct output is already known at runtime - we have a string literal, the pure capitalize function, an interpolation statement, and two literal output statements. This means that we can evaluate the entire thing at compile time, and replace it with just one literal output statement: <h1>Hello &amp; Goodbye</h1>.

The problem however lies in accurately determining const-ness and purity: in order to judge whether a value is pure, we need to know what's in the current scope, but the scope contains the context passed in at runtime.

A proposed solution would work as follows.

First, we define a data structure that represents our knowledge about the current execution context. Essentially, this would be a list of constraints on names in the current namespace, and such a constraint might be something like "is definitely a pure function", "is a constant value", etc.

While traversing the template AST, we can then thread this knowledge data structure through the traversal, updating it as we go. E.g., when we encounter a let block that binds a literal to some variable, then we can add an entry to our knowledgebase that says that this variable now holds a const value. Or, when a variable is bound to a lambda expression that we can infer as pure, we add to our knowledgebase the factoid that this variable now holds a pure function.

Now, the conservative approach is to start with an empty slate: we know nothing, but we gain knowledge as we traverse the AST. However, this isn't going to be optimal in the typical use case, where all the built-in global bindings are intact - many builtin functions are in fact pure, and while the host application can change this, people will rarely do this in a way that changes contracts - so we need to cater for the possible case of changing purity semantics of builtins, but at the same time, we also don't want to miss out on the possible optimization when the semantics are intact.

A fairly simple way to solve this is to put the responsibility in the hands of the caller: simply add an argument to the parser that is used to seed the knowledge base. Host applications that never touch any of the builtins can simply pass the "default" knowledge base, telling the parser to generate code that will be correct as long as the host application does in fact stay away from the builtins. Host applications that do touch builtins could either simply pass an empty seed knowledge, or they could modify the default knowledge to reflect the changes they intend to make. Either way though, it's going to be an honor system.

To make it safer, one would have to tag GVals themselves with purity information, and then check whether the passed-in context matches what was declared at compile time. Since this involves deriving the seed knowledge from the context provided by the host, exposing this derivation makes sense, because then host applications can reuse the code to automatically generate the right seed knowledge for the context they intend to pass to the template. It would also require remembering the seed knowledge in the template itself, so that it becomes possible to check that the runtime context matches the compile-time promises. And if it doesn't, we can either reject the template (conservative choice), or we can run it anyway ("dangerous mode"), or we can fall back to an alternative compilate that assumes nothing (empty context), or we could simply fall back to the entirely unoptimized version of the AST.