whitequark / ast

A library for working with Abstract Syntax Trees.
MIT License
195 stars 25 forks source link

How do you handle node references? #30

Open akimd opened 3 years ago

akimd commented 3 years ago

Some processing, such as typing, are easier to deal with when you install references from uses to definitions. Typically, I first traverse my tree, paying attention to scopes, and binding local variable uses to their corresponding variable definition. This is super handy in later traversals, as you can very easily recover any meta-data left on the variable definition wherever you have a variable use.

However, since this library enforces an immutable approach, this pattern is very fragile: if I merely add an annotation to some variable definition, I'll get a completely different object, and all my bindings will be incorrectly pointing to the old version.

Is there a recommended pattern to handle such cross-node references? I'm used to imperative style ASTs (where this is straightforward), and I feel kind of stupid here :-(

iliabylich commented 3 years ago

As an option you could do something similar to what astrolable gem does, i.e. a custom subclass with a mutable hash property.

In general I guess it's impossible to have immutable data structure with cross-references (like doubly linked list; to make a reference from A to B you have to change B, but you can only create a new B, but that requires creating new A, etc). I'd suggest to either switch to a different data pattern (like zipper) or drop a mutability requirement. The latter seems to be easier.

akimd commented 3 years ago

Hi Ilya, thanks for the quick response. I didn't know the Zipper pattern, I'm looking into it.

But I agree dropping immutability seems nice. Actually, I am also concerned by speed, and I believe mutability can still buy me some perfs. However I enjoyed using this library. Do you have any experience with perverting it to the mutable side of the force? Is this completely insane, or just ok? At first sight introducing something like update! seems to do the job, but I might be missing something big.

Cheers!

iliabylich commented 3 years ago

Do you have any experience with perverting it to the mutable side of the force?

AFAIK there's no way in Ruby to "unfreeze" an object.

Is this completely insane, or just ok?

I think it's ok if you need it.

At first sight introducing something like update! seems to do the job, but I might be missing something big.

Other libraries do it too, for example rubocop uses exactly the same pattern to track parents.

akimd commented 3 years ago

The rubocop link is precious, there are plenty of interesting things in there, thanks a lot!

However I don't think mutable_properties will suffice in my case, as I'm also rewriting my trees. Yet I want to maintain references. And if I rewrite a subtree, I'd like to avoid having to rebuild everything around it.

So I'd need to edit the ast library to get rid of the freeze.

mbj commented 3 years ago

I'd dislike this the ast gem dropping immutability. Dunno if that is proposed here.

There are lots of patterns to associate data with immutable objects. Worst case you can use a data overlay referencing the individual ast nodes #object_id value which is unique enough to associate mutable data with. But before going into that area I'd try to exhaust plenty of alternatives.

In unparser (which has to track lots of context) the immutability of the Parser::AST::Node instances is never a problem, and I track several codata structures through the emitter hierarchy, emitters are also itself immutable, Also unparser is still fast enough for mass processing of AST nodes.

I suggest to research into functional data structures for inspiration. I'd also be happy to help to provide an immutable pattern to associate data with @akimd should he be willing to share more of his use case details.

akimd commented 3 years ago

Hi Markus. Thanks for dropping in.

I'd dislike this the ast gem dropping immutability. Dunno if that is proposed here.

No, I'm not. I'm only considering to do it locally, in my environment. I don't think that unparser is shaking the AST a lot, so I'm not sure your mileage applies to my case. Likewise for Rubocop. But of course, I might wrong.

I'm looking for a means to rewrite my trees while maintaining reference.

Let's consider a simple example, à la Ruby (my trees are not about Ruby, my case is different from the topic we exchanged about, Markus, and about which I still don't have feedback to provide yet). It's not a particular example of what I'm doing, but it is quite typical (I think :).

Say I have

x := 0 + 0
puts(x)

where := denotes variable definition (not a mere assignment).

I have a first pass that binds the variable-use of x in the puts to its definition: simply store the definition node in the metadata of the use. (I don't maintain "forward references" from the definition to its uses.) This is used for typing for instance.

Then I have a pass that rewrites the tree, for instance for constant folding:

x := 0
puts(x)

With an immutable structure, the variable-definition is no longer the same object, so I have to update all the references in its uses. This is what I'd like to avoid. Not to mention that if that bit lives in a massive tree, the tree still will have to be recreated because of this rewriting. Since I like to have several simple passes rather than a single big one, that cost would be repeated.

That's why I'm considering hacking ast locally to have support for mutable trees. What would be your choice?

Cheers!

whitequark commented 3 years ago

Having worked on similar applications, I strongly suggest looking into a separate mutable structure that keeps e.g. variable metadata. For example, you could associate an environment with variable values to every AST node. This mutable structure would be updated by the passes, and later, once your pass pipeline is done, your application would query you for the information.

Although term rewriting is conceptually simple and works well for straightforward cases it is liable to limit you later, either because the AST rewriting is costly (this is what I hit, since I used term rewriting on immutable ASTs), because mutable AST nodes being aliased (e.g. cached) can lead to subtle semantic bugs, because mutated AST makes it harder to show good diagnostics, and finally, because it constrains you only to analyses which can be expressed as term rewrites.

mbj commented 3 years ago

Having worked on similar applications, I strongly suggest looking into a separate mutable structure that keeps e.g. variable metadata.

Strong concur, this is what I'd do.

akimd commented 3 years ago

Hi all,

Thanks for all your inputs!

Having worked on similar applications, I strongly suggest looking into a separate mutable structure that keeps e.g. variable metadata. For example, you could associate an environment with variable values to every AST node. This mutable structure would be updated by the passes, and later, once your pass pipeline is done, your application would query you for the information.

The example I gave is probably misleading. What I'm going, in a way, is sort of source-to-source transformation. For instance one of my passes goes from an expression-oriented language to a form compatible with statement-oriented languages, so I have significant tree transformations to run. And that's where I would like to keep as much meta-data as possible.

So yes, I have initial passes that analyses the ast, but then later passes do have to act on them. My question was really about these tree rewriting passes.

mbj commented 3 years ago

@akimd I personally am out of options to help, without seeing the actual source code. I could write up some source-to-source transformation, but the chance I hit your exact use case problem is slim.

akimd commented 3 years ago

There is actually much to show :)

For instance this snippet extracts rich expressions (for instance structures such as Ruby's expression-oriented case) from the arguments of function calls, and defines local variables for these values and passes them to the function call:

    def on_call(code)
      id, args = code[:id, :args]
      vars, nargs = extract_variables(args)
      if vars.any?
        make_block(exps: vars + [code.updated(nil, [id, nargs])])
      else
        code
      end
    end

This is really plain ast, with a few additions (such as naming the children, which matters in our case since we persists the trees, and want to support the addition of new children in the future).

But since some subtrees can be rewritten, the visitors much pay attention to always recreate the whole tree. But that's very much what is recommended in the processors:

    def process(code)
      handler = :"on_#{code.sid}"
      if !code.macro && respond_to?(handler)
        send handler, code
      else
        code.updated(nil, process_children(code))
      end
    end

I have understood that the addition of something like Rubocop's mutable_properties will save us from some "useless" rewrites for merely labeling the tree with additional metadata. As for rewriting, I must study more closely the proposal you guys made here.

Cheers!