Mutable Immutable Syntax Trees

matklad commented 3 years ago

Mad Scientist Mode: ON

This issue proposes the next step of the evolution of our syntax tree library (rowan). The goal here is to provide an ergonomic and performant API for editing syntax trees.

Background & Status Quo

Providing nice editing API for syntax tree is hard for a classical "this is hard" reason -- aliasing.

Let's say you want to implement "move bounds to where clause" refactoring.

So, you have something like

fn foo<T: Eq>() where T: Clone {}

and you want to move Eq to Clone + Eq.

The natural way to code this would be something like this:

let t_param // T: Eq
  = func.generic_params().find_by_name("T")?;

let eq_bound // Eq
  = t_param.bounds().find_by_name("Eq")?;

let where_clause // T: Clone
  = func.where_clause().find_by_param_name("T")?;

t_param.remove_bound(&eq_bound);
where_clause.add_bound(&eq_bound)

That is, find where clause and original bound, remove the bound and snap it to the where clause.

To make this API work, we need to make it so that the tree change we affect via t_param.remove_bound call is visible to the where_clause local variable. That is, if after removal we go up the tree from where_clause and then go down, we shouldn't see the eq_bound.

Today, the syntax tree is immutable, so all editing API return a fresh copy of a tree. This works fine for point mutations, but breaks down when you want to modify two places at once.

With an immutable API, after you do t_param.remove_bound(&eq_bound);, the where_clause points to the wrong tree! Functional operations compose badly in this case.

To work around this, we use SyntaxRewriter class (stolen from Roslyn, naturally). It is configured with a replacement map, then walks the syntax tree applying substitutions from the map, reassembling a new tree in process. This class is awkward to use and also requires O(N) processing, as it essentially rebuilds the whole tree.

Proposed API

Let's just make mutable API work :-) This is not as scary as it sounds -- the underlying syntax tree is still immutable. The trick is to leverage GreenNode / SyntaxNode split once again to cut through yet another tradeoff.

As a reminder, the data of syntax trees is stored in a so-called "green" tree -- a standard Arced functional tree On top of GreenNode, we layer SyntaxNode, which adds parent pointers and absolute offsets. Additionally, SyntaxNodes are !Sync and leverage that to use Rc and thread-local allocators. Each SyntaxNode stores a pointer to a GreenNode.

We'll add mutating API to SyntaxNode. As SyntaxNode fundamentally alias each other via parent pointers, the API would use &self methods.

Mutation operations would still create independent copies of green nodes. However, they'll update all existing SyntaxNodes to point to new green nodes in the corresponding trees. As our database stores green nodes internally, mutations won't actually mutate DB data.

To make this API less surprising, we'll add a .clone_for_update call to opt-into mutation, so assists would work like this:

let original_file = db.parse(file_id);
let file = original_file.clone_for_update();
mutate(&file);
return diff(&original_file, &file);

Implementation

How do we implement the "update all SyntaxNodes" bits? I haven't coded this up yet, but I think this should be doable.

We add the following fields to SyntaxNode:

mutable: bool,
next_sibling: Weak<SyntaxNode>,
prev_sibling: Weak<SyntaxNode>,

These fields complete existing parent pointer to form bidirectionally-connected tree of live nodes. We also only update those pointes if mutable is true, which means we don't increase costs of traversal for majority of cases.

An important point is that we consider only live nodes: nodes stored in local variables and their ancestors. Two neighboring live nodes are not necessary tree neighbors.

Additionally, for mutable trees, we change semantics of the offset field -- it now stores relative offset in the parent node. To compute full offset, we walk the tree to the root. This is the same algorithm that IntelliJ uses.

If I am not missing something, that means that during modifications, only nodes to the path to root and their siblings need to be updated, which seems doable!

matklad commented 3 years ago

I plan to implement this as a safe toy impl first:

https://github.com/rust-analyzer/mini-rowan

The purpose is two-fold:

to actually make sure that the thing is implementable
to have a toy version of rowan for folks to learn from

matklad commented 3 years ago

I've updated mini-rowan to include offset API and token/node split. I think this should be good enough to port big rowan to this.

There are a couple more optimizations we might want to do, but I am not sure if it makes sense to start with them:

eagerly computing absolute offsets for immutable trees
skipping pointer-update stuff for immutable trees.

pksunkara commented 3 years ago

I am probably the only other person who's using the syntax trees (https://github.com/pksunkara/cargo-up). So, I will chime in a bit. 😄

After building the syntax tree, I have a visitor pattern on it as shown here. While visiting over these nodes, I am using the text_edit crate's TextEditBuilder to gather all the changes for the underlying code for now.

if let Some(parent) = path_expr.syntax().parent() {
    if let Some(call_expr) = CallExpr::cast(parent) {
        u.insert(
            call_expr.syntax().text_range().end(),
            ".author(crate_authors!()).version(crate_version!())",
        );
    }
}

When I finish walking/visiting, I am applying all these changes. This way, the syntax nodes are immutable while I am walking.

I am little confused if that property is going to stay if I use this new API instead of TextEditBuilder. Or do I need to do the changes on syntax that I get from clone_for_update?

iDawer commented 3 years ago

TL;DR: Consider adding a wrapper type for rowan::SyntaxNode that would handle mutation related stuff.

My proposal is trying to address issues listed in https://github.com/rust-analyzer/rust-analyzer/pull/7498#issuecomment-784359922.

So, what if rowan::SyntaxNode is left unchanged and a newtype, say rowan::TrackedSyntaxNode, made to track live nodes. Mutability becomes an opt-in from tree user point of view with all it's costs.

A quick sketch showing the design

```rust mod rowan { use std::{cell::Cell, ops::Range}; struct NodeData {/*snip*/} pub struct SyntaxNode { ptr: std::ptr::NonNull, } pub struct TrackedSyntaxNode { inner: SyntaxNode, mutable: bool, first: Cell<*const TrackedSyntaxNode>, next: Cell<*const TrackedSyntaxNode>, prev: Cell<*const TrackedSyntaxNode>, } impl SyntaxNode { pub fn clone_for_update(&self) -> TrackedSyntaxNode { todo!() } fn offset(&self) -> () { // it does not care about mutability todo!("self.data().offset") } } impl TrackedSyntaxNode { // SyntaxNode methods propagated here, with corrections for mutability-awared ones. fn offset(&self) -> () { self.offset_mut() } fn offset_mut(&self) -> () { todo!() } // mutation methods go here pub fn detach(&self) {} pub fn splice_children(&self, to_delete: Range, to_insert: Vec) {} } } mod rust_analyzer { pub mod syntax { pub mod ast { use super::ast_mut; use crate::rowan; pub trait AstNode { /// Editable counterpart of this AST node type Editable: Sized; // snip.. fn cast_editable(syntax: rowan::TrackedSyntaxNode) -> Option where Self: Sized; fn syntax(&self) -> &rowan::SyntaxNode; fn clone_for_update(&self) -> Self::Editable where Self: Sized, { Self::cast_editable(self.syntax().clone_for_update()).unwrap() } } pub struct GenericParamList { pub(super) syntax: rowan::SyntaxNode, } impl AstNode for GenericParamList { type Editable = ast_mut::GenericParamList; fn cast_editable( syntax: rowan::TrackedSyntaxNode, ) -> Option { Some(ast_mut::GenericParamList { syntax }) } fn syntax(&self) -> &rowan::SyntaxNode { &self.syntax } } } pub mod ast_mut { use crate::rowan; pub struct GenericParamList { pub(super) syntax: rowan::TrackedSyntaxNode, } } } mod ide_assists { use super::syntax::{ ast::{self, AstNode}, ast_mut, }; pub fn move_bounds_to_where_clause(acc: (), ctx: ()) -> Option<()> { let type_param_list: ast::GenericParamList = todo!("ctx.find_node_at_offset::()?"); let type_param_list: ast_mut::GenericParamList = type_param_list.clone_for_update(); // snip Some(()) } } } ```

The same is made for syntax tokens. This removes an allocation for syntax tokens during traversal of immutable tree.

A major drawback of my proposal is bloating syntax crate with ast_mut module. As an alternative to the extra module could be adding an enum like:

mod ast {
    use crate::rowan;

    enum SyntaxNode {
        Immutable(rowan::SyntaxNode),
        Mutable(rowan::TrackedSyntaxNode),
    }

    struct GenericParamList {
        syntax: SyntaxNode,
    }
}

Summarizing my proposal, the question is: in API level, does move of dispatching over mutability up-wise [from rowan internals into AST API] make sense?

If it looks feasible to you and I'm not missing tricky details, I'd like to tackle on it.

matklad commented 3 years ago

I don’t think duplicating the types would work: now every function that takes an ast::ParamList would require two versions. A possible work-around here is to have a single type which has mutability marker as a type parameter. But that has a similar problem: now everything consuming asts needs to be generic.

I think there’s a lot of simplicity in having a single non-generic type.

On Thursday, 11 March 2021, Dawer @.***> wrote:

TL;DR: Consider adding a wrapper type for rowan::SyntaxNode that would handle mutation related stuff.

My proposal is trying to address issues listed in #7498 (comment) https://github.com/rust-analyzer/rust-analyzer/pull/7498#issuecomment-784359922 .

So, what if rowan::SyntaxNode is left unchanged and a newtype, say rowan::TrackedSyntaxNode, made to track live nodes. Mutability becomes an opt-in from tree user point of view with all it's costs. A quick sketch showing the design

mod rowan { use std::{cell::Cell, ops::Range};
struct NodeData {/*snip*/}

pub struct SyntaxNode {
    ptr: std::ptr::NonNull<NodeData>,
}

pub struct TrackedSyntaxNode {
    inner: SyntaxNode,

    mutable: bool,
    first: Cell<*const TrackedSyntaxNode>,
    next: Cell<*const TrackedSyntaxNode>,
    prev: Cell<*const TrackedSyntaxNode>,
}

impl SyntaxNode {
    pub fn clone_for_update(&self) -> TrackedSyntaxNode {
        todo!()
    }

    fn offset(&self) -> () {
        // it does not care about mutability
        todo!("self.data().offset")
    }
}

impl TrackedSyntaxNode {
    // SyntaxNode methods propagated here, with corrections for mutability-awared ones.
    fn offset(&self) -> () {
        self.offset_mut()
    }

    fn offset_mut(&self) -> () {
        todo!()
    }

    // mutation methods go here
    pub fn detach(&self) {}
    pub fn splice_children(&self, to_delete: Range<usize>, to_insert: Vec<TrackedSyntaxNode>) {}
}
} mod rust_analyzer { pub mod syntax { pub mod ast { use super::ast_mut; use crate::rowan;
        pub trait AstNode {
            /// Editable counterpart of this AST node
            type Editable: Sized;
            // snip..
            fn cast_editable(syntax: rowan::TrackedSyntaxNode) -> Option<Self::Editable>
            where
                Self: Sized;
            fn syntax(&self) -> &rowan::SyntaxNode;
            fn clone_for_update(&self) -> Self::Editable
            where
                Self: Sized,
            {
                Self::cast_editable(self.syntax().clone_for_update()).unwrap()
            }
        }

        pub struct GenericParamList {
            pub(super) syntax: rowan::SyntaxNode,
        }

        impl AstNode for GenericParamList {
            type Editable = ast_mut::GenericParamList;
            fn cast_editable(
                syntax: rowan::TrackedSyntaxNode,
            ) -> Option<ast_mut::GenericParamList> {
                Some(ast_mut::GenericParamList { syntax })
            }
            fn syntax(&self) -> &rowan::SyntaxNode {
                &self.syntax
            }
        }
    }

    pub mod ast_mut {
        use crate::rowan;

        pub struct GenericParamList {
            pub(super) syntax: rowan::TrackedSyntaxNode,
        }
    }
}

mod ide_assists {
    use super::syntax::{
        ast::{self, AstNode},
        ast_mut,
    };

    pub fn move_bounds_to_where_clause(acc: (), ctx: ()) -> Option<()> {
        let type_param_list: ast::GenericParamList =
            todo!("ctx.find_node_at_offset::<ast::GenericParamList>()?");
        let type_param_list: ast_mut::GenericParamList = type_param_list.clone_for_update();
        // snip
        Some(())
    }
}
}

The same is made for syntax tokens. This removes an allocation for syntax tokens during traversal of immutable tree.

A major drawback of my proposal is bloating syntax crate with ast_mut module. As an alternative to the extra module could be adding an enum like:

mod ast { use crate::rowan;
enum SyntaxNode {
    Immutable(rowan::SyntaxNode),
    Mutable(rowan::TrackedSyntaxNode),
}

struct GenericParamList {
    syntax: SyntaxNode,
}
}

Summarizing my proposal, the question is: in API level, does move of dispatching over mutability up-wise [from rowan internals into AST API] make sense?

If it looks feasible to you and I'm not missing tricky details, I'd like to tackle on it.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/rust-analyzer/rust-analyzer/issues/6857#issuecomment-796933880, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANB3M2XBBQ7JGD577WBAQ3TDEBBNANCNFSM4UZQ2LOA .

lnicola commented 3 years ago

Closed by #7498?

matklad commented 3 years ago

jup!

rust-lang / rust-analyzer