Closed matklad closed 3 years ago
I plan to implement this as a safe toy impl first:
https://github.com/rust-analyzer/mini-rowan
The purpose is two-fold:
I've updated mini-rowan to include offset
API and token/node
split. I think this should be good enough to port big rowan to this.
There are a couple more optimizations we might want to do, but I am not sure if it makes sense to start with them:
I am probably the only other person who's using the syntax trees (https://github.com/pksunkara/cargo-up). So, I will chime in a bit. 😄
After building the syntax tree, I have a visitor pattern on it as shown here. While visiting over these nodes, I am using the text_edit
crate's TextEditBuilder
to gather all the changes for the underlying code for now.
if let Some(parent) = path_expr.syntax().parent() {
if let Some(call_expr) = CallExpr::cast(parent) {
u.insert(
call_expr.syntax().text_range().end(),
".author(crate_authors!()).version(crate_version!())",
);
}
}
When I finish walking/visiting, I am applying all these changes. This way, the syntax nodes are immutable while I am walking.
I am little confused if that property is going to stay if I use this new API instead of TextEditBuilder
. Or do I need to do the changes on syntax that I get from clone_for_update
?
TL;DR: Consider adding a wrapper type for rowan::SyntaxNode
that would handle mutation related stuff.
My proposal is trying to address issues listed in https://github.com/rust-analyzer/rust-analyzer/pull/7498#issuecomment-784359922.
So, what if rowan::SyntaxNode
is left unchanged and a newtype, say rowan::TrackedSyntaxNode
, made to track live nodes. Mutability becomes an opt-in from tree user point of view with all it's costs.
The same is made for syntax tokens. This removes an allocation for syntax tokens during traversal of immutable tree.
A major drawback of my proposal is bloating syntax
crate with ast_mut
module.
As an alternative to the extra module could be adding an enum like:
mod ast {
use crate::rowan;
enum SyntaxNode {
Immutable(rowan::SyntaxNode),
Mutable(rowan::TrackedSyntaxNode),
}
struct GenericParamList {
syntax: SyntaxNode,
}
}
Summarizing my proposal, the question is: in API level, does move of dispatching over mutability up-wise [from rowan internals into AST API] make sense?
If it looks feasible to you and I'm not missing tricky details, I'd like to tackle on it.
I don’t think duplicating the types would work: now every function that takes an ast::ParamList would require two versions. A possible work-around here is to have a single type which has mutability marker as a type parameter. But that has a similar problem: now everything consuming asts needs to be generic.
I think there’s a lot of simplicity in having a single non-generic type.
On Thursday, 11 March 2021, Dawer @.***> wrote:
TL;DR: Consider adding a wrapper type for rowan::SyntaxNode that would handle mutation related stuff.
My proposal is trying to address issues listed in #7498 (comment) https://github.com/rust-analyzer/rust-analyzer/pull/7498#issuecomment-784359922 .
So, what if rowan::SyntaxNode is left unchanged and a newtype, say rowan::TrackedSyntaxNode, made to track live nodes. Mutability becomes an opt-in from tree user point of view with all it's costs. A quick sketch showing the design
mod rowan { use std::{cell::Cell, ops::Range};
struct NodeData {/*snip*/} pub struct SyntaxNode { ptr: std::ptr::NonNull<NodeData>, } pub struct TrackedSyntaxNode { inner: SyntaxNode, mutable: bool, first: Cell<*const TrackedSyntaxNode>, next: Cell<*const TrackedSyntaxNode>, prev: Cell<*const TrackedSyntaxNode>, } impl SyntaxNode { pub fn clone_for_update(&self) -> TrackedSyntaxNode { todo!() } fn offset(&self) -> () { // it does not care about mutability todo!("self.data().offset") } } impl TrackedSyntaxNode { // SyntaxNode methods propagated here, with corrections for mutability-awared ones. fn offset(&self) -> () { self.offset_mut() } fn offset_mut(&self) -> () { todo!() } // mutation methods go here pub fn detach(&self) {} pub fn splice_children(&self, to_delete: Range<usize>, to_insert: Vec<TrackedSyntaxNode>) {} }
} mod rust_analyzer { pub mod syntax { pub mod ast { use super::ast_mut; use crate::rowan;
pub trait AstNode { /// Editable counterpart of this AST node type Editable: Sized; // snip.. fn cast_editable(syntax: rowan::TrackedSyntaxNode) -> Option<Self::Editable> where Self: Sized; fn syntax(&self) -> &rowan::SyntaxNode; fn clone_for_update(&self) -> Self::Editable where Self: Sized, { Self::cast_editable(self.syntax().clone_for_update()).unwrap() } } pub struct GenericParamList { pub(super) syntax: rowan::SyntaxNode, } impl AstNode for GenericParamList { type Editable = ast_mut::GenericParamList; fn cast_editable( syntax: rowan::TrackedSyntaxNode, ) -> Option<ast_mut::GenericParamList> { Some(ast_mut::GenericParamList { syntax }) } fn syntax(&self) -> &rowan::SyntaxNode { &self.syntax } } } pub mod ast_mut { use crate::rowan; pub struct GenericParamList { pub(super) syntax: rowan::TrackedSyntaxNode, } } } mod ide_assists { use super::syntax::{ ast::{self, AstNode}, ast_mut, }; pub fn move_bounds_to_where_clause(acc: (), ctx: ()) -> Option<()> { let type_param_list: ast::GenericParamList = todo!("ctx.find_node_at_offset::<ast::GenericParamList>()?"); let type_param_list: ast_mut::GenericParamList = type_param_list.clone_for_update(); // snip Some(()) } }
}
The same is made for syntax tokens. This removes an allocation for syntax tokens during traversal of immutable tree.
A major drawback of my proposal is bloating syntax crate with ast_mut module. As an alternative to the extra module could be adding an enum like:
mod ast { use crate::rowan;
enum SyntaxNode { Immutable(rowan::SyntaxNode), Mutable(rowan::TrackedSyntaxNode), } struct GenericParamList { syntax: SyntaxNode, }
}
Summarizing my proposal, the question is: in API level, does move of dispatching over mutability up-wise [from rowan internals into AST API] make sense?
If it looks feasible to you and I'm not missing tricky details, I'd like to tackle on it.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/rust-analyzer/rust-analyzer/issues/6857#issuecomment-796933880, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANB3M2XBBQ7JGD577WBAQ3TDEBBNANCNFSM4UZQ2LOA .
Closed by #7498?
jup!
Mad Scientist Mode: ON
This issue proposes the next step of the evolution of our syntax tree library (
rowan
). The goal here is to provide an ergonomic and performant API for editing syntax trees.Background & Status Quo
Providing nice editing API for syntax tree is hard for a classical "this is hard" reason -- aliasing.
Let's say you want to implement "move bounds to where clause" refactoring.
So, you have something like
and you want to move
Eq
toClone + Eq
.The natural way to code this would be something like this:
That is, find where clause and original bound, remove the bound and snap it to the where clause.
To make this API work, we need to make it so that the tree change we affect via
t_param.remove_bound
call is visible to thewhere_clause
local variable. That is, if after removal we go up the tree fromwhere_clause
and then go down, we shouldn't see theeq_bound
.Today, the syntax tree is immutable, so all editing API return a fresh copy of a tree. This works fine for point mutations, but breaks down when you want to modify two places at once.
With an immutable API, after you do
t_param.remove_bound(&eq_bound);
, thewhere_clause
points to the wrong tree! Functional operations compose badly in this case.To work around this, we use
SyntaxRewriter
class (stolen from Roslyn, naturally). It is configured with a replacement map, then walks the syntax tree applying substitutions from the map, reassembling a new tree in process. This class is awkward to use and also requires O(N) processing, as it essentially rebuilds the whole tree.Proposed API
Let's just make mutable API work :-) This is not as scary as it sounds -- the underlying syntax tree is still immutable. The trick is to leverage
GreenNode
/SyntaxNode
split once again to cut through yet another tradeoff.As a reminder, the data of syntax trees is stored in a so-called "green" tree -- a standard
Arc
ed functional tree On top ofGreenNode
, we layerSyntaxNode
, which adds parent pointers and absolute offsets. Additionally,SyntaxNode
s are!Sync
and leverage that to useRc
and thread-local allocators. EachSyntaxNode
stores a pointer to aGreenNode
.We'll add mutating API to
SyntaxNode
. AsSyntaxNode
fundamentally alias each other via parent pointers, the API would use&self
methods.Mutation operations would still create independent copies of green nodes. However, they'll update all existing
SyntaxNode
s to point to new green nodes in the corresponding trees. As our database stores green nodes internally, mutations won't actually mutate DB data.To make this API less surprising, we'll add a
.clone_for_update
call to opt-into mutation, so assists would work like this:Implementation
How do we implement the "update all
SyntaxNode
s" bits? I haven't coded this up yet, but I think this should be doable.We add the following fields to
SyntaxNode
:These fields complete existing
parent
pointer to form bidirectionally-connected tree of live nodes. We also only update those pointes ifmutable
istrue
, which means we don't increase costs of traversal for majority of cases.An important point is that we consider only
live
nodes: nodes stored in local variables and their ancestors. Two neighboring live nodes are not necessary tree neighbors.Additionally, for mutable trees, we change semantics of the offset field -- it now stores relative offset in the parent node. To compute full offset, we walk the tree to the root. This is the same algorithm that IntelliJ uses.
If I am not missing something, that means that during modifications, only nodes to the path to root and their siblings need to be updated, which seems doable!