rust-analyzer / rowan

Apache License 2.0
697 stars 57 forks source link

Trailing objects in GreenNode #32

Closed CAD97 closed 4 years ago

CAD97 commented 4 years ago

Supersedes #31 by doing it for real this time. In addition, the laundry list of assumptions have been reduced to just alignment of GreenNode and offset of GreenNode fields, which are tested but not checked at runtime. The one unwritten assumption is that there is no padding after GreenNode.children, which I think is indeed guaranteed by #[repr(C)].

I still expect this to be neutral or negative on performance. GreenNode used to be 3xusize and now Arc<GreenNode> is 2xusize, but GreenElement is still 5xusize due to storing the 4xusize large GreenToken inline. The internal cursor::NodeData remains at 5xusize large as well as SyntaxNode, SyntaxToken, and SyntaxElement remaining unchanged.

Realistically, this removes one pointer chase in NodeData to get to the green node's children at the cost of some tricky unsafe allocation. It also adds a pointer chase in GreenElement to get to a GreenNode's kind or len.

CAD97 commented 4 years ago

ra is ported to use this branch at https://github.com/rust-analyzer/rust-analyzer/pull/2104 for perf analysis.

matklad commented 4 years ago

Still don't have time to take a closer look now, but I remember a couple of ideas I had to deal with big tokens:

|node header|child node 1|child node 2|child node3|token 1|token 2|token3|

The latter approach is interesting, because it can, in theory, give us cheap O(1) access to the nth child. So, something like trait.name would do something like self.children().get(3), rather than the current iteration.

CAD97 commented 4 years ago

To sketch the layout/sizes that we currently have (just looking at usize=u64):

master:

GreenElement { // align 8, size 40
    tag: enum { Node, Token }, // align 1/8, size 1/8
    // padding 7_7/8
    payload: union { // align 8, size 32
        node: GreenNode, // align 8, size 24
        token: GreenToken, // align 8, size 32
    }
}

GreenNode { // align 8, size 24
    kind: SyntaxKind, // align 2, size 2
    // padding 2
    text_len: TextUnit, // align 4, size 4
    children: Arc<[GreenElement]>, // align 8, size 16
}

GreenToken { // align 8, size 32
    kind: SyntaxKind // align 2, size 2
    text: SmolStr, // align 8, size 24
}

"Two array":

GreenNode { // align 8, size dyn
    text_len: TextUnit, // align 4, size 4
    child_count: u16, // align 2, size 2
    node_kind: b(child_count), // `child_count` bits to specify interleaving of nodes/tokens
                               // this also gives us the sizes of nodes/tokens via count_ones/count_zeros
    // padding to 8; none when `child_count % 64 == 16`
    nodes: [Arc<GreenNode>], // align 8, size count*16
    tokens: [GreenToken], // align 8, size count*32
}
// note: vla would be `[u8]` from rustc's POV and encompass everything after `child_count`

GreenToken { // align 8, size 32
    kind: SyntaxKind // align 2, size 2
    text: SmolStr, // align 8, size 24
}

So, something like trait.name would do something like self.children().get(3), rather than the current iteration.

  TRAIT_DEF@[0; 16)
    TRAIT_KW@[0; 5) "trait"
    WHITESPACE@[5; 6) " "
    NAME@[6; 7)
      IDENT@[6; 7) "X"
    WHITESPACE@[7; 8) " "
    ITEM_LIST@[8; 16)
      L_CURLY@[8; 9) "{"
      WHITESPACE@[9; 15) "\n    \n"
      R_CURLY@[15; 16) "}"
  TRAIT_DEF@[0; 16)
    VISIBILITY@[0; 3)
      PUB_KW@[0; 3) "pub"
    WHITESPACE@[3; 4) " "
    TRAIT_KW@[4; 9) "trait"
    WHITESPACE@[9; 10) " "
    NAME@[10; 11)
      IDENT@[10; 11) "X"
    WHITESPACE@[11; 12) " "
    ITEM_LIST@[12; 16)
      L_CURLY@[12; 13) "{"
      WHITESPACE@[13; 15) "\n\n"
      R_CURLY@[15; 16) "}"

I suppose if "get nth non-token node" was a cheap operation the layout may be changed to support using that, but currently nth non-token-node doesn't mean anything because optional nodes are just not present when not present. Also, for losslessness, it adds needing to keep track of the interleaving of the two arrays.

I still have yet to figure out the exact cost of doing a variable length encoding scheme.

CAD97 commented 4 years ago

Having a VLA and VLE seems like a big win IF we also make it a thin pointer... which requires inlining an Arc implementation... which is... scary... but I'm going to try it out because who doesn't like a hard problem.

GreenNode { // align(u64) size(dyn)
    head: { // align(u64) size(u64)
        text_len: TextUnit, // u32
        kind: SyntaxKind, // u16
        child_count: u16,
    }
    children: [u64],
    // 0u64 ArcGreenNode // (u64, usize, (u32 padding if usize=u32))
    // 1u64 GreenToken // (u64, SyntaxKind, (u48 padding), SmolStr)
                        // NB: cannot compress this due to requirement of &GreenToken
}
matklad commented 4 years ago

That’s not related to the PR really, but let me write down this idea...

Looks like the problem we have is that tokens are large. And the reason for that mainly is that SmolStr is 3 pointers large. Should we replace it .... with interning?

I am super against rustc style interning, which uses thread locals and index: I really want the syntax tree to be a value type with, khm, no strings attached. That was the primary motivation for small str.

But what if we do file-local interning using Arc? The tree will still be a value type, but the size of toke will be smaller, and not many additional allocations will be required. I’d say, as a first cut, we perhaps can retrofit this onto existing green node cache? Let’s just box&intern all the tokens. I think this has a fair chance of reducing total memory usage: size of token goes down to single pointer, and almost all tokens are shared by many nodes. With this optimization, I feel that tailing objects approach might actually be a net win then? Like, we box all tokens and nodes, NodeOrToken is one pointer long (we stick the tag into pointer’s last bit).

matklad commented 4 years ago

Sorry for super-confusing wording: this is me failing spectacularly to get a good nights rest befor tomorrow flight :-)

CAD97 commented 4 years ago

A tree-local deduped boxing of tokens definitely would allow the array to be much more regular. And it would eliminate a lot of the trickiness required in #33 which I personally think has crossed the line into "too clever".

I considered the idea of boxing the tokens earlier and just overlooked the ability to dedupe them. That approach is definitely worth trying out, so I'll play with it over the weekend.

CAD97 commented 4 years ago

Closing in favor of #34.