Make `Atom<'a>` and `CompactString` nice and fast

Boshen commented 9 months ago

This is a continuation of

Where the requirements are:

Make our own version of Atom<'a> and CompactString.

These 2 types should be immutable and inline-able. The size of these two types (either 16 or 32 bytes) should be dependent on real world statistics of the length of identifiers.

overlookmotel commented 9 months ago

The size of these two types (either 16 or 32 bytes) should be dependent on real world statistics of the length of identifiers.

I'm actually not sure how much of an optimization inlining will be. The main advantage of compact_str's CompactStr vs std's String in most use cases is that CompactStr often avoids allocating. But that doesn't apply in OXC, because Atom<'a> can just reference a slice of the source text - so no allocation either way.

Maybe it will still be a solid gain due to cache locality and faster string comparison, but I think we'd need to measure to know either way.

That could inform what ideal size is - make it bigger to inline more often vs smaller to reduce size of AST.

Boshen commented 9 months ago

After some consideration, I believe we will be satisfied with the current implementation for a significant period. Although it's tempting to pursue further optimization, we should direct our energy toward endeavors with broader impact.

pub enum Atom<'a> {
    Arena(&'a str),
    Compact(CompactString),
}

overlookmotel commented 9 months ago

OK cool. I do have a WIP already, so may finish it up when I get a chance, but yeah, the main problems (memory leak and unsoundness) are now solved.

Boshen commented 9 months ago

Reopening since you have a WIP.

Boshen commented 9 months ago

I think I got what you mean now.

Atom<'a> is a pointer to the allocator. We should keep Atom<'a> on the stack for for small string optimization, and it's an O(1) operation when converting to CompactString.

Smart.

overlookmotel commented 9 months ago

Yes, Atom<'a> is either:

Pointer to source text/arena + length, or
Up to 16 bytes of string data stored inline within the Atom<'a> itself.

So, yes, it's an O(1) operation (and a very cheap one) converting an inline Atom<'a> to an inline CompactStr (when string is 16 bytes or less) - because all the data is inline.

However, if the string is longer than 16 bytes, conversion is O(n) because the string content will need to be copied to general heap, so that CompactStr can own that heap allocation. The longer the string, the higher the cost.

But my guess is that the majority of strings will be 16 bytes or less (because they are predominantly identifiers, which tend to be quite short in JS), so it'll be O(1) mostly.

If OXC transformer/semantic/etc can just use Atom<'a>, without converting to CompactStr, then zero conversion cost and no heap allocations at all. The downside is that then you have to deal with the 'a lifetime.

hyf0 commented 8 months ago

I'd like to mention that:

converting inline Atom<'_> to CompactStr is O(1)
But converting the inline CompactStr to Atom<'_> is O(n), require allocation on allocator. Writing back data to ast is very common operations in transfomer and bundler.

overlookmotel commented 8 months ago

@hyf0 Just to expand on what you've said:

Converting inline Atom<'_> to CompactStr is O(1) and very cheap.
Converting inline CompactStr to Atom<'_> is O(1) and very cheap.
Converting out-of-band Atom<'_> to CompactStr is O(n).
Converting out-of-band CompactStr to Atom<'_> is O(n).
Clone for Atom<'_> is O(1).
Clone for CompactStr is O(n).

"Inline" means any string 16 bytes or less. The "n" in "O(n)" is the length of the string.

Only difference between Atom<'a> and CompactStr is that CompactStr owns a heap allocation storing the string data, whereas Atom<'a> references string data in the arena allocator - the allocator owns the data, and Atom<'a> just references it ('a is lifetime of the allocator).

It is unavoidable that there's a cost to copying strings from general heap to arena, or in the other direction - and that cost increases the longer the string is.

The inline optimization is intended to ameloriate this cost for the common case - strings under 16 bytes - which I imagine covers most JS identifiers (but please tell me if your experience shows otherwise).

OXC's AST cannot contain types which own heap allocations, due its use of an arena allocator (discussed on https://github.com/rolldown-rs/rolldown/issues/427). For consumers who really want to maximize performance, the best route is to work with Atom<'_> instead of CompactStr. This can be somewhat painful because of dealing the lifetime, but it's the only way to "go with the grain" of OXC's design, while preserving all possible performance. CompactStr is provided just as a convenience where consumer finds it impractical to have a lifetime.

NB: The above refers to Atom<'_> and CompactStr as they will be, not as they are now. So far, we've got the API to its intended shape, but the inline optimization is not implemented yet.

overlookmotel commented 8 months ago

@hyf0 Just to add: The use of an arena allocator in OXC, and all the lifetimes it necessitates, is an unusual pattern in Rust. Personally I see it as a feature not a bug - it's extremely performant, but beyond that, OXC essentially managing its own memory also enables some really significant things which aren't otherwise possible.

However, at present it can be really painful for the consumer. I think we could better document patterns for interfacing with the allocator and AST which take advantage of it, and make the lifetimes (almost) painless. Such patterns do exist, but they're not immediately obvious.

So in short, from what I understand, the problem that you're facing is very real. But there may be a different solution from the one you're looking for with CompactStr.

hyf0 commented 8 months ago

I was wondering if we could make the string type generic like Program<'_, S> so it could gains advantages in every scenario.

Use Program<'a, Atom<'a>> in scenario for oxc_module_lexer and linter and other scenarios that don't need to interreact with string frequently.

Use Program<'a, OtherStr> for transformer or bundler that require certain optimizations for the string type.

overlookmotel commented 8 months ago

This is not my decision to make, but my view is this isn't a great solution for 2 reasons:

Firstly, it's not just Program<'a> which would need to become Program<'a, S>. Every AST type would also need it e.g. ExpressionStatement<'a, S>, Expression<'a, S>, IfStatement<'a, S> etc.

This would be very unergonomic, and would make the API pretty horrible for other external consumers of OXC. So much so, that the solution I've proposed for handling different Span types in https://github.com/oxc-project/oxc/issues/959#issuecomment-1982302527 went to great lengths to avoid exactly this, because I thought it was clearly unacceptable for readability/ergonomics.

Secondly, if your motivator is performance, my guess is that it wouldn't help with that. Yes, the string type wouldn't require conversions, so that's a gain, but on the other side, the whole AST would have to become Drop, otherwise you have memory leaks - which means a complete tree traversal when dropping an AST. This is just a guess, but my assumption is that negative would outweigh the positive, and performance would be worse overall, not better.

It seems I have ended up being the guy who just says "no" all the time! That's not my nature, and it gives me no pleasure. I totally get why you're looking at every angle to find a solution, but it's just that I don't think this circle can be squared in the way you want. It's a fundamental limitation/feature/trade-off in the nature of the arena allocator which OXC uses.

More positively, can I suggest:

Can we wait until the inline optimization to Atom<'a> and CompactStr is implemented, and then see how bad this performance problem is in practice after that?
Could we have a proper "in person" chat about all this some time? I get the general idea, but the exact problem you're trying to solve is still a bit unclear to me. Maybe if we can clarify it, I may be able to help find a different solution which sidesteps this issue.

I'm afraid I wouldn't be able to do either of those things in the near future, but if we can put this on the back burner for a while, I'd be very happy to do that.

In meantime, if you have time, could you possibly outline in a bit more detail exactly what problem you're hitting?

hyf0 commented 8 months ago

Cool. I wasn't saying we have to do this way but just tried giving any thought for inspiration.

This would be very unergonomic

Yeah it may be. But since we are already dealing with lifetime. I guess if we do it this way, it's just another dx problem that need to be handled. For example, most of users would handle the default ast type like type Program = ProgramGeneric<'a, Atom<'a>>, no differences compared to the current type. Only advanced users would use it.

Again, These are thoughts just for inspiration, not we have to do it.

Secondly, if your motivator is performance, my guess is that it wouldn't help with that. Yes, the string type wouldn't require conversions, so that's a gain, but on the other side, the whole AST would have to become Drop, otherwise you have memory leaks

Yeah. You could see Drop that actually do something takes much time here. but the thing for oxc is that oxc ast could defautly has empty drop method and what's the cost of calling these method.

If the costing is on-demand(only when you used owed-string type) and the empty Drop has little cost. I guess the whole Drop thing won't be a problem anymore.

Can we wait until the inline optimization to Atom<'a> and CompactStr is implemented, and then see how bad this performance problem is in practice after that?

Totally, I'm not hurry for this but try to have collision of ideas, it's hard for me to find people to discuss such things.

Could we have a proper "in person" chat about all this some time?

This would be wonderful. Since this is not an urgent problem, we could wait until the proper timing. Like your actions of pursuing perfection on oxc. I learn a lot of practical knowledge on rust from that.

overlookmotel commented 8 months ago

OK, let's leave the broader issue of owned/borrowed strings for time being and come back to it when we both have time to really get into the discussion.

Just to come back on a couple of points briefly:

Yeah. You could see Drop that actually do something takes much time here. but the thing for oxc is that oxc ast could defautly has empty drop method and what's the cost of calling these method.

If the costing is on-demand(only when you used owed-string type) and the empty Drop has little cost. I guess the whole Drop thing won't be a problem anymore.

Yes, you're right. But what I'm saying is that in your use case, using the owned string type variant, you would have a Drop which does a complete tree traversal, so it would cost you.

Totally, I'm not hurry for this but try to have collision of ideas, it's hard for me to find people to discuss such things.

I have the same problem! Always very happy to discuss.

This would be wonderful. Since this is not an urgent problem, we could wait until the proper timing. Like your actions of pursuing perfection on oxc. I learn a lot of practical knowledge on rust from that.

OK great. Agreed. By the way, it's very kind of you to say that. In return, congratulations on getting Rolldown in shape to be be open sourced in such a short time. Brilliant work.

overlookmotel commented 7 months ago

Not working on this at present, but just wanted to make a note of a thought I had, before it gets lost in the ether...

I am wondering if we should have two different strings types:

For identifiers
For everything else

It strikes me that the 2 are used in quite different ways. Identifiers are constantly being compared to each other to calculate scopes etc, and tend to be small. Other strings (e.g. string literals) can be much larger, and are generally just moved around rather than compared.

So maybe we could have 2 implementations optimized for the 2 different use cases. In particular fast hashing is likely a priority for identifiers, inlining is probably worthwhile, and maybe even interning. But all these things would be much less useful for string literals etc, so maybe they should just be a wrapper around &str, and avoid the overhead of a "clever" implementation which never gets used.

If we did go with interning, maybe identifier strings could even be stored in a separate long-lived arena. So individual ASTs can come and go, but the identifiers stay. Maybe this could help with @hyf0's use case, even in the presence of Rolldown's incremental compilation. Obviously, multi-threading also comes into play, so this isn't simple. But maybe...

Just a thought...

hyf0 commented 7 months ago

Good thought! Sounds like an advanced version of https://github.com/rust-analyzer/smol_str.

overlookmotel commented 6 months ago

A thought to come back to later:

IdentifierReference contains both Atom<'a> and ReferenceId. Can we get rid of Atom here, and store the atom in the reference which ReferenceId points to?

https://github.com/oxc-project/oxc/blob/b9d69ad665ab386233f940e9f1a9b3095a2238dc/crates/oxc_ast/src/ast/js.rs#L423-L431

overlookmotel commented 5 months ago

This is interesting: https://github.com/xacrimon/dashmap/pull/287

We are commonly hashing Atoms when e.g. building scope tree. Possibly could avoid repeated hashing and memory accesses by storing hash inline in Atom.

overlookmotel commented 4 months ago

We should also create our own version of Cow, perhaps modelled on beef.

overlookmotel commented 2 months ago

Further thought:

I think we need a separate type Ident<'a> for identifiers which are commonly used as keys in hash maps (bindings and references, basically).

Ident<'a> should have the invariant that it is not an empty string. Then we can provide cheap Ident::first_char, Ident::last_char, Ident::first_byte, and Ident::last_byte methods which have no bounds checks.

Anatomy of an `Ident`

Ident<'a> is essentially a &'a str, except with max len of u32::MAX, and with hash stored inline. If we use a 32-bit hash:

// 16 bytes
struct Ident<'a> {
    ptr: NonNull<u8>,
    len: u32,
    hash: u32,
    _marker: PhantomData<&'a str>,
}

PartialEq::eq becomes very cheap for non-equal Idents: If lengths or hashes do not match, no need to read or compare the actual strings.

Hash::hash becomes just a read of the hash field (extremely cheap).

The parser would generate Idents and store them in AST where we currently have Atoms.

Many identifiers already have fairly expensive string comparisons in lexer, in order to check if they are keywords (await etc). Calculating hash will probably not cost too much more than these checks already cost.

Possibly, we could make the hash function a perfect hash for these keywords to further increase the speed of keyword checks.

ident! macro for static Idents which pre-calculates hashes at build time for common identifiers. Macro either generated in build script or codegen.

impl Ident<'static> {
    const fn new_const_hashed(s: &'static str, hash: u32) -> Self {
        assert!(s.len() <= u32::MAX as usize);
        Self {
            ptr: unsafe { NonNull::new_unchecked(std::ptr::from_ref(s).cast::<u8>().cast_mut()) },
            len: s.len() as u32,
            hash,
            _marker: PhantomData,
        }
    }
}

macro_rules! ident {
    (React) => Ident::new_const_hashed("React", 0xF8506214);
    (createElement) => Ident::new_const_hashed("createElement", 0x4192A408);
    // etc
}

let react_ident = ident!(React);

Interning

We could also intern idents (in local stores, 1 store per AST, not Sync, so no locks).

IdentId would be a 32-bit ID (like SymbolId etc). This is what would be stored in AST where we currently have Atom<'a>.

The actual Idents would be stored in a side array IndexVec<IdentId, Ident> with a HashMap<Ident, IdentId> to look up IdentId for any string.

In parser, identifiers would be converted to IdentIds by hashing them and then looking up IdentId in the hash map. The same identifier name would always be resolved to the same IdentId.

Now comparing 2 identifiers is a very cheap comparison of two u32s, not an expensive string comparison. And hash maps can be keyed by IdentId, rather than strings. Hashing a u32 with FxHash is only 3 CPU ops, making hash map lookups really cheap, and also rehashing when a hashmap grows also much less expensive.

overlookmotel commented 1 month ago

Also look into this for string comparisons: https://crates.io/crates/fastcmp

Unclear how much faster than std it is for short strings (which are majority of JS identifiers).

oxc-project / backlog