rust-analyzer / rowan

Apache License 2.0
696 stars 57 forks source link

The great API simplification #9

Closed matklad closed 5 years ago

matklad commented 5 years ago

This PR greatly ~breaks~ simplifies the API by getting rid of RefRoot / OwnedRoot and leveraging native rust references instead.

To explain what's going on, it is helpful to start with the history of the current implementation.

Originally, the red-green tree approach was popularized by Roslyn. Red-Green solves allows one to have an immutable tree with cheap updates and with offests and parent pointers on nodes. At the first sight, this seems impossible: an offset of the node depends on the offsets of all of the previous nodes, so if you insert a new node at the beginning of the tree, then all offsets must be recalculated, resulting in O(N) performance. The same reasoning applies to parent links.

The sword to cut this knot is laziness. First, we build a green tree -- a usual immutable tree without parent pointers and with lengths instead of offsets (lengths can be maintained cheaply by summing lengths of children). Then, we build a lazy read tree. Each read node stores a green node, a parent red node, start offset and an array of red children which are calculated lazily. Traversing the tree is O(1): we either use a cached node (constant) or create a one new node (constant as well). When modifying the tree, we keep the green nodes, but tear down all the red nodes. This is still amortizing O(1) though, if we count traversals as well: to tear down a read node, you first need to traverse it.

Roslyn is implemented in C#, so their red-green relies on garbage collection for memory management.

The red-green implementation of red-green in C++ exists in Swift. To solve memory management, they added a third layer of SyntaxNodes. In Swifts libsyntax, green nodes are a usual immutable trees, red nodes own a green node and children, and store a raw pointer to the parent node. SyntaxNodes bind together a reference-counted pointer to the root of the tree and a raw pointer to the red node. Only SyntaxNode is a public API, and it guarantees that the underling red-node is alive. To be more concrete, here are equivalent Rust type definitions:

struct RedNode {
    offest: usize,
    parent: *const RedNode,
    children: Vec<Lazy<RedNode>>,
}

struct SyntaxNode {
    root: Arc<RedNode>,
    node: *const RedNode,
}

The problem with this approach is that traversing of the syntax tree constantly creates new SyntaxNodes, bumping the Arc. Ideally, reading the tree should just traverse the pointers, without write operations. It would also be cool to be able to work with Copy syntax nodes! And that's where the old implementation of rowan comes from. The idea of old implementation is to make the root field of the SynaxNode generic: root: R, so that R can be either &'a RedNode or a Arc<RedNode>. That way, you'll get to chose between owned version (arc bumps, .clone required, no lifetimes) and borrowed version (cheap, Copy, requires lifetime annotations). Unfortunately, being generic over ownership is not the most pleasant thing to do in Rust, and the resulting API was quite mind-bending and verbose.

This new API started from the desire to leverage usual reference and use &'a SyntaxNode for borrowed variant and SyntaxNode for owned variant. However, this doesn't quite work, due to the requirement of sharing data between SyntaxNodes. A similar situation happens if you are writing an application, and you have some Ctx struct which you'd like to store behind the Arc. If you do this, some parts of the app will work with Arc<Ctx> and some with &'a Ctx: note how owned variant have a smart pointer around. So, in this version a borrowed node looks like &'s SyntaxNode and an owned one as TreePtr<SyntaxNode>.

Specifically, we merge RedTree and SyntaxNode into one struct, and add a root: *const TreeRoot field to a node. Physically, this root is always stored inside an Arc. An owned TreePtr stores a *const SyntaxNode inside, but it bumps the root's Arc on creation/clone. That is, TreePtr<T> is basically and Arc to the tree root, except that it points not to the root itself, but to some node within a tree. This "manual" management of ref counts forms one bit of unsafely here.

The other bit of unsafely comes from the desire to be able to use TreePtr not only with raw SyntaxNodes, but with ast wrappers as well. Because ast and raw syntax use exactly the same binary representation (it's only compile-time types that differ), there's actually no implementation required for this to work: just some casts. So, TransparentNewType trait establishes this "it's safe to cast between newtypes" relation.

matklad commented 5 years ago

Here's a large scale adaptation of this API in rust-analyzer: https://github.com/rust-analyzer/rust-analyzer/pull/449

matklad commented 5 years ago

migrated rust-analyzer to this API, everything seems to work fine!

matklad commented 5 years ago

@flodiebold thanks for the review!

matklad commented 5 years ago

published: https://crates.io/crates/rowan/0.2.0