udoprog / syntree

A memory efficient syntax tree for language developers
51 stars 2 forks source link

Mention differences with Rowan with respect to modification #14

Open matklad opened 1 year ago

matklad commented 1 year ago

The main reason why Rowan doesn’t store syntax nodes in a contiguous chunk of memory is to enable cheap modification. Eg, after incremental parsing, the “new” and the “old” tree enjoy structural sharing. Similarly, Rowan’s refactoring API allows for somewhat efficient “piece-by-piece” modification of a tree without requiring to walk&rewrite the whole tree.

I think it’s somewhat important to call out this difference in the docs, as that’s a significant capability difference. Not that it makes Rowan better, mind you: I am still not quite satisfied with our refactoring APIs, and I now feel like Rowan is optimizing for the wrong use-case —- most of the trees are read-only.

If you’d like, you can send a PR to Rowan to add a related crates section —- I feel like this crate would be a better choice then Rowan in most scenarios!

udoprog commented 1 year ago

[..] without requiring to walk&rewrite the whole tree.

This is not a requirement I intend to keep any longer. Nodes no longer have to be strictly organized in walking order and could as a result use a slab for storage. This hasn't been implemented yet but I intend to once I complete the parser rewrite for Rune.

Structural sharing is however something worth highlighting. It's not a feature I make use of and if you could elaborate on how you use it that would be helpful!

matklad commented 1 year ago

Hm, but would you need to update every span anyway?

udoprog commented 1 year ago

If spans are enabled - yes. If they're not enabled then source reconstruction would have to depend on something stored in the syntax tree. Like a string identifier.

udoprog commented 1 year ago

I suppose you could also reconstruct spans by only storing lengths. Meaning you'd just have to sum the length of the preceding hierarchy. This would work well in a case of source reconstruction since you're bound to walk the tree anyway to accomplish this. And be reasonably efficient when looking up spans for errors and the like.