rust-analyzer / rowan

Apache License 2.0
697 stars 57 forks source link

New green tree experiment #45

Closed CAD97 closed 4 years ago

CAD97 commented 4 years ago

This PR is just to collect discussion about this experimental rewrite. Docs can be found here: https://cad97.github.io/rowan/rowan/

The rewrite uses a design for supporting thin pointers as Thin<P<T>> by implementing a trait ErasablePtr for all P that can into_raw, and a trait Erasable for all T that can convert from ptr::NonNull<?> to ptr::NonNull<T>.

GreenNode and GreenToken are thus made to be ?Sized + Erasable types, and a private GreenElement type is effectively used as union { Thin<Arc<GreenNode>>, Thin<Arc<GreenToken>> }. The outside world only ever sees NodeOrToken.

Additionally, we introduce a type ArcBorrow<'_, T> to represent a shared reference that is known to be behind an Arc. Unfortunately, we cannot directly make &GreenNode -> Arc<GreenNode> sound, as the outside world is (theoretically) allowed to move even ?Sized types out of an Arc (e.g. into a Box, just like Box -> Arc can be done even with ?Sized types).

All construction of nodes/tokens is proxied through a GreenBuilder in order to encourage reuse of nodes/tokens. Additionally, we eliminate unnecessary copies when building tokens by storing the cache as a map from (kind, &str) pairs to the token. This uses unsafe trickery to have the key point directly into the value's text to avoid adding a borrow from the outside world.

This experiment is starting from a scratch, erased repository for ease of reimplementation, but if we want to adopt some or all of the ideas, I will re-implement them on top of the existing history in a much more cooperative manner. I've not yet extended this experiment to the cursor and api modules, but intend to do so. I suspect that those modules will see little to no API surface change (they're already near optimal imho), so adopting this new green API would not be too invasive in theory.

matklad commented 4 years ago

but I believe this is not problematic in practice; any user of GreenTreeBuilder almost certainly already is carrying around the lifetime for the source text.

That's not true: https://github.com/rust-analyzer/rust-analyzer/blob/cc1ef95d1962a0ca503b059c2275edde66d9b130/crates/ra_mbe/src/syntax_bridge.rs#L350 parses from token tree, not from text, so it synthesizes token text on the fly

CAD97 commented 4 years ago

83c5bf2 now does some unsafe wizardry to avoid the lifetime by synthesizing a &'static str value-borrow.

CAD97 commented 4 years ago

This draft has now re-added the syntax/api view. (Again, see the docs at https://cad97.github.io/rowan/rowan/.) These are unified by using a impl Language called Generic as the default language parameter type.

I plan to re-add the syntax tree differentiating between nodes and tokens. It is for this reason that we have Language::is_token: this allows language implementations to choose to implement deferred parsing. This is done by instead of expanding a node into a GreenNode and proper children, just storing it as a GreenToken of the correct kind. This kind is then used to determine if the node is a token or not. Care will have to be taken to make sure that iterators behave sensibly in the presence of non-token leaves: do they care about including/skipping tokens, or do they actually care about leaves?

After re-adding that, I think this experiment just needs me to port the s-expressions example and some more of the existing tests, then it's ready to be properly reviewed for design. I'll repeat though, if we decide to adopt any of the ideas used here, it'll probably be better to re-adapt them than adopt this experiment as-is. The one exception being adopting this wholesale, but even as such, it should be squashed and avoid taking out tests that we already have written.