oxc-project / oxc

⚓ A collection of JavaScript tools written in Rust.
https://oxc.rs
MIT License
12.36k stars 450 forks source link

Memory leak in arena with `Atom`s #1803

Closed overlookmotel closed 9 months ago

overlookmotel commented 10 months ago

OXC appears to leak memory when source includes any strings over 24 bytes in length.

A demonstration here: https://github.com/overlookmotel/oxc/tree/leak-test

Checkout that branch, and cargo run -p test_leak.

The problem

I believe the problem is this:

In the demo above, I've added impl Drop for Atom which logs when an Atom is dropped. It never gets called, demonstrating that the heap memory is leaked.

Possible solutions

1. Add impl Drop to Box

This would be like bumpalo::collections::Box.

Problem is that this would make dropping the AST more expensive, negating a major benefit of using an arena.

2. Track and drop Atoms manually

Store pointers to all non-inline Atoms as they're created. A Drop impl on the arena could then drop them when the arena is dropped.

3. Replace CompactString

Replace CompactString with a similar "small string" implementation which stores long strings in the bumpalo arena, instead of the general heap.

As well as solving the leak, I believe this would also be more performant:

  1. Less heap allocations.
  2. Strings would be stored close to the AST nodes containing them, producing better cache locality.

CompactString is quite complicated, but OXC only uses a small fraction of CompactString's capabilities, so creating a custom small string implementation in OXC I don't think would be too tricky.

In addition:

CompactString is built as a replacement for std's String, and includes a capacity field to allow resizing. But OXC's Atom is immutable, so the capacity field is not required. All that's needed is a Box<str> equivalent.

So a new "small string" type could lose the capacity field and be reduced in size to 16 bytes, rather than the current 24. Obviously, only strings up to 16 bytes could then be stored inline, but still most JS identifiers are under 16 characters, and if string content is stored in the arena, storing longer strings "out of line" has less performance penalty.

Further optimizations

For identifers without escapes (pretty much all identifiers) and string literals without escape sequences (most strings), Atom could just be a pointer to the string in source code, rather than copying the string into the arena.

Secondly, when lexing strings and identifiers containing escape sequences, AutoCow creates a string in the arena. That string can be pointed to by Atom rather than copied to another location in the arena.

In fact, escaped strings for some reason currently are stored in the arena three times, though I'm not sure why. Dump of arena contents after parsing x = 'ABCDE\n';:

|0c000000 00000000 486d602e 01000000| ........Hm`..... 00000000
|00000000 0e000000 00000000 00000000| ................ 00000010
|00000000 00000000 b06d602e 01000000| .........m`..... 00000020
|05000000 00000000 806d602e 01000000| .........m`..... 00000030
|00000000 0d000000 0098aa6f 01000000| ...........o.... 00000040
|04000000 0d000000 41424344 450a0000| ........ABCDE... 00000050
|00000000 00000000 00000000 000000c6| ................ 00000060
|f0010000 00004142 4344450a 42434445| ......ABCDE.BCDE 00000070
|00000000 01000000 00000000 01000000| ................ 00000080
|78000000 00000000 00000000 00000000| x............... 00000090
|00000000 000000c1 0090aa6f 01000000| ...........o.... 000000a0
overlookmotel commented 10 months ago

For similar reasons, BigIntLiteral also leaks memory because num_bigint::BigInt stores data on the heap.

Any other structures which store data on the heap will also leak, though I couldn't see any other cases at first glance.

Boshen commented 10 months ago

Thank you for the in-depth research.

For strings, my previous attempt was https://github.com/oxc-project/oxc/issues/451, I think we can look into this problem space a bit more to find a good solution.

For big ints, the usages are rare - perhaps we could box it into the arena?

@overlookmotel Would you like to explore and find a way to mitigate these memory leak problems with us? Thank you in advance 🙇

Boshen commented 10 months ago

Let's start with option 3 (replace compact string).

I still want to try the hack from https://github.com/rust-lang/rust/blob/f9097f87c9c094f80826fb60a1a624b5f9f1ed82/compiler/rustc_span/src/symbol.rs#L1999-L2007.

Boshen commented 10 months ago

In fact, escaped strings for some reason currently are stored in the arena three times, though I'm not sure why. Dump of arena contents after parsing x = 'ABCDE\n';:

This is going to be fun bug ... AutoCow is copied from jsparagus https://github.com/mozilla-spidermonkey/jsparagus/blob/212f6bdbc2cae909e7d5cfebf36284560c3c4ef4/crates/parser/src/lexer.rs#L2256

Maybe there is a usage problem in our lexer code?

overlookmotel commented 10 months ago

First go at a new Atom implementation here: https://github.com/overlookmotel/oxc/tree/new-atom/crates/oxc_atom (lots of TODO comments)

A few questions:

Mutablility

Are there plans/potential needs in future for Atoms to be mutable?

What I've done so far assumes not, and therefore can reduce size of Atom to 16 bytes, and make the implementation much simpler and more performant.

Use references?

Should all strings be added to the arena? Or if there's an existing reference to the string's content whose lifetime is long enough, can Atom just store a reference to it e.g. if the string in the source code being parsed?

Advantages of storing references:

  1. Lower memory usage.
  2. Parsing should be faster as the majority of strings and identifiers can just be pointed to in source code, so copying strings into the arena will be much reduced.
  3. Cloning Atom becomes very cheap (no copying string content).

Advantage of putting everything in the arena is that strings will always be close in arena to the AST nodes which use them, so better cache locality. Then again, strings also bloat the arena, so possibly result in more swapping in and out of the cache while traversing an AST.

My current WIP is based on storing references.

Support for 32-bit systems

How important is support for 32-bit systems?

What I've done so far produces a maximum cap on length of any individual string of 512 MiB on 32-bit machines (no practical limit for 64-bit). CompactString has a workaround for that limit, but implementing it here would be horrible, as it doesn't mesh well with arenas.

Personally, I imagine 512 MiB is big enough for almost all use cases. And does anyone actually use 32-bit machines any more?

overlookmotel commented 10 months ago

On your other points:

I'm not sure I understand the point of the hack you mentioned in this context. As far as I can see, the reason that code is tricking the compiler that strings have a 'static lifetime is in order to use them as keys in a hash map.

It needs to do that because it's de-duplicating strings. But I thought you'd found that approach was slower due to the locks and hash table lookups, and discounted it?

Or am I missing the point?

I figured out why string content appears 3 times in the arena. It's because bumpalo has the unfortunate property that it grows downward. i.e. in each chunk the first allocation is at end of the chunk, the next allocation before it, and so on.

So when AutoCow pushes to the bumpalo::collections::Vec, each time the Vec has to resize, it can't grow forwards and instead creates a new allocation earlier in the chunk, copies the data earlier, and carries on. The 3rd incidence of the string I noticed was the leftover from that process.

I imagine this same effect also occurs when building any other Vec in the parser e.g. Vec<Statement>. It probably eats a lot of space in the arena, and results in a lot of memory copying, since Vecs are so common in ASTs.

My first thought is this could be improved on by using a separate "scratch space" allocation which can grow forwards.

Boshen commented 10 months ago

Are there plans/potential needs in future for Atoms to be mutable?

No, I haven't encountered a usecase where it needs to be mutable.

Should all strings be added to the arena? Or if there's an existing reference to the string's content whose lifetime is long enough, can Atom just store a reference to it

I wanted to do this for a long time. But handling out &'a str or String<'a> to my downstream tools complicates their code - adding 'a to their code is too anonoying, so I went for a lifetimeless implementation.

If we can hack it and unsafe the 'a, then I'm all in.

how important is support for 32-bit systems?

Your reasoning is solid around 512 MiB 👍

I'm not sure I understand the point of the hack

It's basically this comment:

// SAFETY: we can extend the arena allocation to `'static` because we
// only access these while the arena is still alive.

removing the lifetime will make all downstream code easier to write, since people don't need to carry around the 'a everywhere.

Although we can force an allocation for external use ... so everything is still safe Rust.

I figured out why string content appears 3 times in the arena.

TIL. We can create a separate discussion around this topic. Could be fun to optimize it, or perhaps wasting a lot of time 😅

overlookmotel commented 10 months ago

Thanks for coming back.

I wanted to do this for a long time. But handling out &'a str or String<'a> to my downstream tools complicates their code - adding 'a to their code is too anonoying, so I went for a lifetimeless implementation.

If we can hack it and unsafe the 'a, then I'm all in.

Hmm. Well yes we could hack it, but then wouldn't it be unsound? You could drop the String containing the source code, and continue using the AST which contains references to the string's contents, which would be a use-after-free.

I'm also a little confused as to why Atom<'a> having a lifetime is problematic. Most AST node types have an 'a lifetime already, so why is Atom<'a>'s lifetime more annoying than Statement<'a>'s?

Also, NumberLiteral's raw field is a &'a str.

If the problem is that the source code is stored separately from the arena, so it produces a 2nd lifetime to deal with, how about Allocator storing an immutable reference to the source code buffer along with the bumpalo arena? That'd produce an invariant that the source code buffer must live longer than the arena, and then a consumer only has 1 lifetime to deal with. i.e.:

let src = "x = 123".to_string();
let alloc = Allocator::from_src(&src);
// `src` must live as long as `alloc`

I think it'd be a shame to leave this potential performance gain unrealised, as it could be significant for any code containing long strings (which would include a lot of UI code).

Out of interest, what downstream tools are you talking about? Other OXC crates, or external consumers?

I figured out why string content appears 3 times in the arena.

TIL. We can create a separate discussion around this topic. Could be fun to optimize it, or perhaps wasting a lot of time 😅

Solving this problem in AutoCow should not be difficult. The broader problem with Vec is more tricky, and yes, separate issue.

Boshen commented 10 months ago

How about this - we can expose different APIs for different usages, i.e.

Out of interest, what downstream tools are you talking about? Other OXC crates, or external consumers?

All crates other than the AST crate are considered external crates:


I love the overall direction this issue is going! @overlookmotel do you wanna find a way to hack some of the code so we can get a PR going, and observe performance changes?

Both performance and memory consumption should be improved in theory.

As stated in the policy https://oxc-project.github.io/docs/contribute/rules.html#development-policy

All performance issues (runtime and compilation speed) are considered as bugs in this project.

;-)

Boshen commented 10 months ago

I think I've provided enough context for you to make good judgements, you've gained my trust.

Let's break things and have fun!

overlookmotel commented 10 months ago

you've gained my trust.

Thanks. Or maybe I've just worn you down with my questions!

But yes, I think I have a fairly clear picture now.

Just one clarification, and 1 last question:

  • -> String
  • -> &'a str
  • -> CompactString

To clarify, you mean that Atom<'a> can have a lifetime, but it should provide methods for easy conversion to lifetime-less structures for consumers who need them e.g. impl Atom<'a> { fn to_string(&self) -> String }?

And the last question:

Am I right in thinking that OXC places a limit on size of a source file to max u32::MAX bytes (4 GiB)? (I'm gathering that from Span's start and end being u32s). Therefore string length can be capped at u32::MAX too?

That'd free 4 bytes in Atom for additional metadata (which I have some ideas how to use).

Boshen commented 10 months ago

To clarify, you mean that Atom<'a> can have a lifetime, but it should provide methods for easy conversion to lifetime-less structures for consumers who need them e.g. impl Atom<'a> { fn to_string(&self) -> String }?

Yes, I think we should allow the user (other crates) to pick for their use case so everybody can make their own tradeoffs, as well as stay in safe Rust.

Boshen commented 10 months ago

Am I right in thinking that OXC places a limit on size of a source file to max u32::MAX bytes (4 GiB)? (I'm gathering that from Span's start and end being u32s). Therefore string length can be capped at u32::MAX too?

Yes!

https://github.com/oxc-project/oxc/blob/19577709dbcfc9761cfcb1b39ec7f3ebdc810285/crates/oxc_span/src/span.rs#L14

overlookmotel commented 10 months ago

OK great. Thanks for the answers.

I have a WIP that ticks most of the boxes, but now that holiday is coming to an end I'll have much less time to work on it. Please excuse me if I don't come back with a PR for a while.

Boshen commented 10 months ago

OK great. Thanks for the answers.

I have a WIP that ticks most of the boxes, but now that holiday is coming to an end I'll have much less time to work on it. Please excuse me if I don't come back with a PR for a while.

Take your time, I'm on holiday as well 😁

overlookmotel commented 10 months ago

Sorry, one more question: How much should we care about 32-bit systems? Are they a primary target for OXC?

Reason I ask is implementation would be simpler if could make Atom 12 bytes on 32-bit systems (rather than 8 bytes).

On 64-bit systems it'll be 16 bytes regardless.

Boshen commented 10 months ago

Sorry, one more question: How much should we care about 32-bit systems? Are they a primary target for OXC?

Reason I ask is implementation would be simpler if could make Atom 12 bytes on 32-bit systems (rather than 8 bytes).

On 64-bit systems it'll be 16 bytes regardless.

Not that much. Not a primary target.

overlookmotel commented 10 months ago

Thanks. I'll get on with the implementation. Will come back if more questions, but hopefully I have all the info I need now.

Boshen commented 10 months ago

Why do we need Atom<'a> in the first place if we replace everything to just &'a str on all the AST nodes? Maybe I over-designed it ... we don't actually need to intern nor a lifetimeless string form 🤔

If downstream tool needs a lifetimeless string, they should be able to use CompactStr::from?

overlookmotel commented 10 months ago

Yes, I was thinking along similar lines last night.

&'a str is certainly simpler, and would be faster in the parser, as no mucking about figuring out if whether to store a string inline or not.

But I can still see a few advantages to an Atom<'a> type which stores short strings inline:

  1. Faster reading strings

When printing an AST back to Javascript, when the printer holds an Atom, if if the string is short enough to be stored inline, then no further memory lookup is required to access it.

This would be the case for most identifiers, as typically they'll be under 16 bytes.

  1. Faster comparisons

During e.g. scope analysis, I'd imagine there's a lot of comparing one identifier's name to another's. When those identifiers are 16 chars or less, that should be faster if they're stored inline because:

a. The data is right there inline in the Atom. Likely to be in cache. b. You don't need to worry about reading out of bounds the way you do with a &str, so comparing two inline Atoms is equivalent to comparing two u128s. With AVX, that's only 3 instructions (https://godbolt.org/z/Gvoqc7PYd)

  1. Space for extra data

&str uses 8 bytes for string length. In the context of OXC, we only need 4 bytes, as a string can't be larger than 4 GiB.

A custom Atom type can use those 4 spare bytes for something else. They could, for example, contain a flag for "is ASCII", or the string length in UTF16 characters (to help with #959).

(1) and (2) are the main considerations. Basically, it's a trade-off between speed of creating strings and of reading them. Maybe &str will turn out to be the better choice after all, but maybe not.

I'm most of the way there with the inline Atom implementation, so if you can wait the month or so it'll take me to finish it up (this month is hellish at work, but things will calm down in Feb), we can benchmark the 2 options and see what that reveals.

Does that sound like a reasonable plan?

Boshen commented 10 months ago

Does that sound like a reasonable plan?

Yes yes! I totally forgot that all we want is 😅

enum Atom {
  Inline(..),
  ArenaAllocated(..)
}
overlookmotel commented 10 months ago

@Boshen I don't think this should have been closed. Github seems to have misinterpretted the commit message on dad94f7ddede9d7308df8c92675cfff33a6266b5.

overlookmotel commented 10 months ago

Thanks!

aapoalas commented 10 months ago

:+1: I just ran into this with parsing a trivial script containing only 3n using MIRI.

Boshen commented 10 months ago

👍 I just ran into this with parsing a trivial script containing only 3n using MIRI.

@aapoalas Do you mind sharing your miri setup? I'd like to add a CI for it.

My previous attempt wasn't robust enough https://github.com/oxc-project/oxc/blob/main/.github/workflows/miri.yml

aapoalas commented 10 months ago

👍 I just ran into this with parsing a trivial script containing only 3n using MIRI.

@aapoalas Do you mind sharing your miri setup? I'd like to add a CI for it.

My previous attempt wasn't robust enough https://github.com/oxc-project/oxc/blob/main/.github/workflows/miri.yml

Unfortunately I only just ran the rustup script to add nightly toolchain, and then ran cargo +nightly miri test, nothing more than that. It doesn't seem particularly more robust than your workflow.

Boshen commented 10 months ago

Weird, I tried cargo miri test -p oxc_parser -p oxc_ast -p oxc_allocator -p oxc_semantic in CI and it passed 🤔 https://github.com/oxc-project/oxc/actions/runs/7529087020/job/20492604930

If I run the whole test suite it throws something from crossbeam

   --> /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/crossbeam-epoch-0.9.17/src/atomic.rs:204:11
    |
204 |         &*(ptr as *const T)
    |           ^^^^^^^^^^^^^^^^^ integer-to-pointer cast
    |
    = help: This program is using integer-to-pointer casts or (equivalently) `ptr::from_exposed_addr`,
    = help: which means that Miri might miss pointer bugs in this program.
    = help: See https://doc.rust-lang.org/nightly/std/ptr/fn.from_exposed_addr.html for more details on that operation.
    = help: To ensure that Miri does not miss bugs in your program, use Strict Provenance APIs (https://doc.rust-lang.org/nightly/std/ptr/index.html#strict-provenance, https://crates.io/crates/sptr) instead.
    = help: You can then pass the `-Zmiri-strict-provenance` flag to Miri, to ensure you are not relying on `from_exposed_addr` semantics.
    = help: Alternatively, the `-Zmiri-permissive-provenance` flag disables this warning.

https://github.com/oxc-project/oxc/actions/runs/7528975555/job/20492273972

overlookmotel commented 10 months ago

Is it possible that the tests don't create any AST which contains either BigInts, or strings/identifiers longer than 24 bytes? BigInt always heap allocates, but CompactStr stores strings of up to 24 bytes inline, and only heap allocates for strings longer than that.

If the tests don't do either of these things, they won't trigger the memory leaks.

I imagine running the parser on the files used for the conformance tests would allow miri to find the leaks.

Boshen commented 10 months ago

Thank you both. Replicated

https://github.com/oxc-project/oxc/actions/runs/7530649712/job/20497529584

error: memory leaked: alloc57077 (Rust heap, size: 8, align: 8), allocated here:
    --> /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:98:9
     |
98   |         __rust_alloc(layout.size(), layout.align())
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     |
     = note: inside `std::alloc::alloc` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:98:9: 98:52
     = note: inside `std::alloc::Global::alloc_impl` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:181:73: 181:86
     = note: inside `<std::alloc::Global as std::alloc::Allocator>::allocate` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:241:9: 241:39
     = note: inside `alloc::raw_vec::RawVec::<u64>::allocate_in` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec.rs:199:45: 199:67
     = note: inside `alloc::raw_vec::RawVec::<u64>::with_capacity_in` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec.rs:145:9: 145:69
     = note: inside `std::vec::Vec::<u64>::with_capacity_in` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:672:20: 672:61
     = note: inside `std::vec::Vec::<u64>::with_capacity` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:481:9: 481:49
     = note: inside `num_bigint::biguint::convert::from_radix_digits_be` at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/num-bigint-0.4.4/src/biguint/convert.rs:122:20: 122:74
     = note: inside `num_bigint::biguint::convert::<impl num_traits::Num for oxc_ast::BigUint>::from_str_radix` at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/num-bigint-0.4.4/src/biguint/convert.rs:269:13: 269:44
     = note: inside `num_bigint::bigint::convert::<impl num_traits::Num for num_bigint::BigInt>::from_str_radix` at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/num-bigint-0.4.4/src/bigint/convert.rs:39:18: 39:51
     = note: inside `num_bigint::BigInt::parse_bytes` at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/num-bigint-0.4.4/src/bigint.rs:675:9: 675:41
note: inside `lexer::number::parse_big_int`
    --> crates/oxc_parser/src/lexer/number.rs:97:5
     |
97   |     BigInt::parse_bytes(s.as_bytes(), radix).ok_or("invalid bigint")
     |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Boshen commented 9 months ago

Found some arcane code from ratel, where we can use for our atom:

https://github.com/ratel-rust/ratel-core/blob/4929f0ecdcd470346c9dba5d7f570eb960cc312d/core/src/ast/ident.rs#L128-L135

overlookmotel commented 9 months ago

Here's where I'm up to with Atom so far: https://github.com/overlookmotel/oxc/tree/new-atom/crates/oxc_atom/src

It's heavily based on CompactStr, and I've gone fairly deep figuring out how all their optimizations work (and also finding a couple of my own).

But thanks for link to Ratel's implementation - I'll see if anything we can use from it which is specific to strings in JS.

Working on Atom has also thrown up some thoughts which are a bit broader about the parser/lexer overall, and would be good to resolve before locking Atom's API. For various reasons, I think API should be more constrained than what I have now. I'll raise issues about these questions when I get time.

One question for now: How relaxed are you about breaking changes?

I mean: Is it better to work out API design thorougly before implementing, so the breaking changes happens all in one go? Or is it OK to merge an initial implementation (including breaking changes) and then modify API again later (more breaking changes)?

overlookmotel commented 9 months ago

Also, a naming thing: Should we change the name from Atom to just String?

The name Atom suggest that identical strings are deduplicated. That used to be that case, but it isn't any more, so the name is now misleading.

Boshen commented 9 months ago

Also, a naming thing: Should we change the name from Atom to just String?

Can do since we are not 1.0. But there are many projects that dependents on oxc now, so we'll need to introduce some type alias and the #[deprecated] macro to smooth out the transition.

aapoalas commented 9 months ago

Agree that a deprecation period is nice. Our JS engine uses Atom frankly a bit too much. We're just a hobby project so it doesn't really matter that we break though.

overlookmotel commented 9 months ago

OK, understood. How about breaking changes to the methods of the type?

The simplest starting point is to have these methods to create an Atom/String:

fn new(s: &'a str) -> Atom<'a>;
fn new_in(s: &'a str, alloc: &'a Allocator) -> Atom<'a>;
// Limited to 16 bytes max
const fn new_const(s: &'static str) -> Atom<'static>;

But I think in future it'd be better to remove new() and replace with:

fn new_from_source(SourceSlice<'a>) -> Atom<'a>;

SourceSlice here is a string slice which is guaranteed to be part of the source code. Constraining Atoms to only be able to reference string content which is either in source code or stored in the arena will have benefits both in performance and functionality.

So my question is: Is it a problem to do the initial implementation first, and then have the breaking change of removing new() later as a further breaking change? (NB deprecation would not be possible on that change, as it would be fundamental to Atom's internals).

Boshen commented 9 months ago

Is it a problem to do the initial implementation first, and then have the breaking change of removing new() later as a further breaking change?

Not a problem. We'll need to keep a change log from now on :-)

overlookmotel commented 9 months ago

OK great. I'm working on something else at the moment, but will come back to this soon.

overlookmotel commented 9 months ago

See comment on #2101:

49% of the entire run time lexing checker.ts is spent in bumpalo::collections::string::String::push.

If we can replace usage of Bumpalo's String with something else, that'd produce an 10-15% speed-up in parsing some inputs.

overlookmotel commented 9 months ago

Previous comment turned out to be erroneous. But discovered a couple of other things along the way which are relevant to this issue:

  1. AutoCow is inefficient and could be improved by pushing non-escaped chunks as string slices, rather than pushing many chars 1 by 1.
  2. Bumpalo's String::push_str is extremely slow for similar reasons - it pushes each byte of the string 1 by 1, rather than copying a big chunk all at once.