Closed overlookmotel closed 9 months ago
For similar reasons, BigIntLiteral
also leaks memory because num_bigint::BigInt stores data on the heap.
Any other structures which store data on the heap will also leak, though I couldn't see any other cases at first glance.
Thank you for the in-depth research.
For strings, my previous attempt was https://github.com/oxc-project/oxc/issues/451, I think we can look into this problem space a bit more to find a good solution.
For big ints, the usages are rare - perhaps we could box it into the arena?
@overlookmotel Would you like to explore and find a way to mitigate these memory leak problems with us? Thank you in advance 🙇
Let's start with option 3 (replace compact string).
I still want to try the hack from https://github.com/rust-lang/rust/blob/f9097f87c9c094f80826fb60a1a624b5f9f1ed82/compiler/rustc_span/src/symbol.rs#L1999-L2007.
In fact, escaped strings for some reason currently are stored in the arena three times, though I'm not sure why. Dump of arena contents after parsing x = 'ABCDE\n';:
This is going to be fun bug ... AutoCow
is copied from jsparagus https://github.com/mozilla-spidermonkey/jsparagus/blob/212f6bdbc2cae909e7d5cfebf36284560c3c4ef4/crates/parser/src/lexer.rs#L2256
Maybe there is a usage problem in our lexer code?
First go at a new Atom
implementation here: https://github.com/overlookmotel/oxc/tree/new-atom/crates/oxc_atom
(lots of TODO comments)
A few questions:
Are there plans/potential needs in future for Atom
s to be mutable?
What I've done so far assumes not, and therefore can reduce size of Atom
to 16 bytes, and make the implementation much simpler and more performant.
Should all strings be added to the arena? Or if there's an existing reference to the string's content whose lifetime is long enough, can Atom
just store a reference to it e.g. if the string in the source code being parsed?
Advantages of storing references:
Atom
becomes very cheap (no copying string content).Advantage of putting everything in the arena is that strings will always be close in arena to the AST nodes which use them, so better cache locality. Then again, strings also bloat the arena, so possibly result in more swapping in and out of the cache while traversing an AST.
My current WIP is based on storing references.
How important is support for 32-bit systems?
What I've done so far produces a maximum cap on length of any individual string of 512 MiB on 32-bit machines (no practical limit for 64-bit). CompactString
has a workaround for that limit, but implementing it here would be horrible, as it doesn't mesh well with arenas.
Personally, I imagine 512 MiB is big enough for almost all use cases. And does anyone actually use 32-bit machines any more?
On your other points:
I'm not sure I understand the point of the hack you mentioned in this context. As far as I can see, the reason that code is tricking the compiler that strings have a 'static
lifetime is in order to use them as keys in a hash map.
It needs to do that because it's de-duplicating strings. But I thought you'd found that approach was slower due to the locks and hash table lookups, and discounted it?
Or am I missing the point?
I figured out why string content appears 3 times in the arena. It's because bumpalo has the unfortunate property that it grows downward. i.e. in each chunk the first allocation is at end of the chunk, the next allocation before it, and so on.
So when AutoCow
pushes to the bumpalo::collections::Vec
, each time the Vec
has to resize, it can't grow forwards and instead creates a new allocation earlier in the chunk, copies the data earlier, and carries on. The 3rd incidence of the string I noticed was the leftover from that process.
I imagine this same effect also occurs when building any other Vec
in the parser e.g. Vec<Statement>
. It probably eats a lot of space in the arena, and results in a lot of memory copying, since Vec
s are so common in ASTs.
My first thought is this could be improved on by using a separate "scratch space" allocation which can grow forwards.
Are there plans/potential needs in future for Atoms to be mutable?
No, I haven't encountered a usecase where it needs to be mutable.
Should all strings be added to the arena? Or if there's an existing reference to the string's content whose lifetime is long enough, can Atom just store a reference to it
I wanted to do this for a long time. But handling out &'a str
or String<'a>
to my downstream tools complicates their code - adding 'a
to their code is too anonoying, so I went for a lifetimeless implementation.
If we can hack it and unsafe the 'a
, then I'm all in.
how important is support for 32-bit systems?
Your reasoning is solid around 512 MiB 👍
I'm not sure I understand the point of the hack
It's basically this comment:
// SAFETY: we can extend the arena allocation to `'static` because we
// only access these while the arena is still alive.
removing the lifetime will make all downstream code easier to write, since people don't need to carry around the 'a
everywhere.
Although we can force an allocation for external use ... so everything is still safe Rust.
I figured out why string content appears 3 times in the arena.
TIL. We can create a separate discussion around this topic. Could be fun to optimize it, or perhaps wasting a lot of time 😅
Thanks for coming back.
I wanted to do this for a long time. But handling out &'a str or String<'a> to my downstream tools complicates their code - adding 'a to their code is too anonoying, so I went for a lifetimeless implementation.
If we can hack it and unsafe the 'a, then I'm all in.
Hmm. Well yes we could hack it, but then wouldn't it be unsound? You could drop the String
containing the source code, and continue using the AST which contains references to the string's contents, which would be a use-after-free.
I'm also a little confused as to why Atom<'a>
having a lifetime is problematic. Most AST node types have an 'a
lifetime already, so why is Atom<'a>
's lifetime more annoying than Statement<'a>
's?
Also, NumberLiteral
's raw
field is a &'a str
.
If the problem is that the source code is stored separately from the arena, so it produces a 2nd lifetime to deal with, how about Allocator
storing an immutable reference to the source code buffer along with the bumpalo arena? That'd produce an invariant that the source code buffer must live longer than the arena, and then a consumer only has 1 lifetime to deal with. i.e.:
let src = "x = 123".to_string();
let alloc = Allocator::from_src(&src);
// `src` must live as long as `alloc`
I think it'd be a shame to leave this potential performance gain unrealised, as it could be significant for any code containing long strings (which would include a lot of UI code).
Out of interest, what downstream tools are you talking about? Other OXC crates, or external consumers?
I figured out why string content appears 3 times in the arena.
TIL. We can create a separate discussion around this topic. Could be fun to optimize it, or perhaps wasting a lot of time 😅
Solving this problem in AutoCow
should not be difficult. The broader problem with Vec
is more tricky, and yes, separate issue.
How about this - we can expose different APIs for different usages, i.e.
-> String
-> &'a str
-> CompactString
Out of interest, what downstream tools are you talking about? Other OXC crates, or external consumers?
All crates other than the AST crate are considered external crates:
oxc_linter
reads strings everywhereoxc_semantic
exposes a bunch of public facing APIs with Atoms - https://github.com/oxc-project/oxc/blob/main/crates/oxc_syntax/src/module_record.rsI love the overall direction this issue is going! @overlookmotel do you wanna find a way to hack some of the code so we can get a PR going, and observe performance changes?
Both performance and memory consumption should be improved in theory.
As stated in the policy https://oxc-project.github.io/docs/contribute/rules.html#development-policy
All performance issues (runtime and compilation speed) are considered as bugs in this project.
;-)
I think I've provided enough context for you to make good judgements, you've gained my trust.
Let's break things and have fun!
you've gained my trust.
Thanks. Or maybe I've just worn you down with my questions!
But yes, I think I have a fairly clear picture now.
Just one clarification, and 1 last question:
- -> String
- -> &'a str
- -> CompactString
To clarify, you mean that Atom<'a>
can have a lifetime, but it should provide methods for easy conversion to lifetime-less structures for consumers who need them
e.g. impl Atom<'a> { fn to_string(&self) -> String }
?
And the last question:
Am I right in thinking that OXC places a limit on size of a source file to max u32::MAX
bytes (4 GiB)? (I'm gathering that from Span
's start
and end
being u32
s). Therefore string length can be capped at u32::MAX
too?
That'd free 4 bytes in Atom
for additional metadata (which I have some ideas how to use).
To clarify, you mean that Atom<'a> can have a lifetime, but it should provide methods for easy conversion to lifetime-less structures for consumers who need them e.g. impl Atom<'a> { fn to_string(&self) -> String }?
Yes, I think we should allow the user (other crates) to pick for their use case so everybody can make their own tradeoffs, as well as stay in safe Rust.
Am I right in thinking that OXC places a limit on size of a source file to max u32::MAX bytes (4 GiB)? (I'm gathering that from Span's start and end being u32s). Therefore string length can be capped at u32::MAX too?
Yes!
OK great. Thanks for the answers.
I have a WIP that ticks most of the boxes, but now that holiday is coming to an end I'll have much less time to work on it. Please excuse me if I don't come back with a PR for a while.
OK great. Thanks for the answers.
I have a WIP that ticks most of the boxes, but now that holiday is coming to an end I'll have much less time to work on it. Please excuse me if I don't come back with a PR for a while.
Take your time, I'm on holiday as well 😁
Sorry, one more question: How much should we care about 32-bit systems? Are they a primary target for OXC?
Reason I ask is implementation would be simpler if could make Atom
12 bytes on 32-bit systems (rather than 8 bytes).
On 64-bit systems it'll be 16 bytes regardless.
Sorry, one more question: How much should we care about 32-bit systems? Are they a primary target for OXC?
Reason I ask is implementation would be simpler if could make
Atom
12 bytes on 32-bit systems (rather than 8 bytes).On 64-bit systems it'll be 16 bytes regardless.
Not that much. Not a primary target.
Thanks. I'll get on with the implementation. Will come back if more questions, but hopefully I have all the info I need now.
Why do we need Atom<'a>
in the first place if we replace everything to just &'a str
on all the AST nodes? Maybe I over-designed it ... we don't actually need to intern nor a lifetimeless string form 🤔
If downstream tool needs a lifetimeless string, they should be able to use CompactStr::from
?
Yes, I was thinking along similar lines last night.
&'a str
is certainly simpler, and would be faster in the parser, as no mucking about figuring out if whether to store a string inline or not.
But I can still see a few advantages to an Atom<'a>
type which stores short strings inline:
When printing an AST back to Javascript, when the printer holds an Atom
, if if the string is short enough to be stored inline, then no further memory lookup is required to access it.
This would be the case for most identifiers, as typically they'll be under 16 bytes.
During e.g. scope analysis, I'd imagine there's a lot of comparing one identifier's name to another's. When those identifiers are 16 chars or less, that should be faster if they're stored inline because:
a. The data is right there inline in the Atom
. Likely to be in cache.
b. You don't need to worry about reading out of bounds the way you do with a &str
, so comparing two inline Atom
s is equivalent to comparing two u128
s. With AVX, that's only 3 instructions (https://godbolt.org/z/Gvoqc7PYd)
&str
uses 8 bytes for string length. In the context of OXC, we only need 4 bytes, as a string can't be larger than 4 GiB.
A custom Atom
type can use those 4 spare bytes for something else. They could, for example, contain a flag for "is ASCII", or the string length in UTF16 characters (to help with #959).
(1) and (2) are the main considerations. Basically, it's a trade-off between speed of creating strings and of reading them. Maybe &str
will turn out to be the better choice after all, but maybe not.
I'm most of the way there with the inline Atom
implementation, so if you can wait the month or so it'll take me to finish it up (this month is hellish at work, but things will calm down in Feb), we can benchmark the 2 options and see what that reveals.
Does that sound like a reasonable plan?
Does that sound like a reasonable plan?
Yes yes! I totally forgot that all we want is 😅
enum Atom {
Inline(..),
ArenaAllocated(..)
}
@Boshen I don't think this should have been closed. Github seems to have misinterpretted the commit message on dad94f7ddede9d7308df8c92675cfff33a6266b5.
Thanks!
:+1: I just ran into this with parsing a trivial script containing only 3n
using MIRI.
👍 I just ran into this with parsing a trivial script containing only
3n
using MIRI.
@aapoalas Do you mind sharing your miri setup? I'd like to add a CI for it.
My previous attempt wasn't robust enough https://github.com/oxc-project/oxc/blob/main/.github/workflows/miri.yml
👍 I just ran into this with parsing a trivial script containing only
3n
using MIRI.@aapoalas Do you mind sharing your miri setup? I'd like to add a CI for it.
My previous attempt wasn't robust enough https://github.com/oxc-project/oxc/blob/main/.github/workflows/miri.yml
Unfortunately I only just ran the rustup script to add nightly toolchain, and then ran cargo +nightly miri test
, nothing more than that. It doesn't seem particularly more robust than your workflow.
Weird, I tried cargo miri test -p oxc_parser -p oxc_ast -p oxc_allocator -p oxc_semantic
in CI and it passed 🤔 https://github.com/oxc-project/oxc/actions/runs/7529087020/job/20492604930
If I run the whole test suite it throws something from crossbeam
--> /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/crossbeam-epoch-0.9.17/src/atomic.rs:204:11
|
204 | &*(ptr as *const T)
| ^^^^^^^^^^^^^^^^^ integer-to-pointer cast
|
= help: This program is using integer-to-pointer casts or (equivalently) `ptr::from_exposed_addr`,
= help: which means that Miri might miss pointer bugs in this program.
= help: See https://doc.rust-lang.org/nightly/std/ptr/fn.from_exposed_addr.html for more details on that operation.
= help: To ensure that Miri does not miss bugs in your program, use Strict Provenance APIs (https://doc.rust-lang.org/nightly/std/ptr/index.html#strict-provenance, https://crates.io/crates/sptr) instead.
= help: You can then pass the `-Zmiri-strict-provenance` flag to Miri, to ensure you are not relying on `from_exposed_addr` semantics.
= help: Alternatively, the `-Zmiri-permissive-provenance` flag disables this warning.
https://github.com/oxc-project/oxc/actions/runs/7528975555/job/20492273972
Is it possible that the tests don't create any AST which contains either BigInt
s, or strings/identifiers longer than 24 bytes? BigInt
always heap allocates, but CompactStr
stores strings of up to 24 bytes inline, and only heap allocates for strings longer than that.
If the tests don't do either of these things, they won't trigger the memory leaks.
I imagine running the parser on the files used for the conformance tests would allow miri to find the leaks.
Thank you both. Replicated
https://github.com/oxc-project/oxc/actions/runs/7530649712/job/20497529584
error: memory leaked: alloc57077 (Rust heap, size: 8, align: 8), allocated here:
--> /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:98:9
|
98 | __rust_alloc(layout.size(), layout.align())
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: inside `std::alloc::alloc` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:98:9: 98:52
= note: inside `std::alloc::Global::alloc_impl` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:181:73: 181:86
= note: inside `<std::alloc::Global as std::alloc::Allocator>::allocate` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/alloc.rs:241:9: 241:39
= note: inside `alloc::raw_vec::RawVec::<u64>::allocate_in` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec.rs:199:45: 199:67
= note: inside `alloc::raw_vec::RawVec::<u64>::with_capacity_in` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/raw_vec.rs:145:9: 145:69
= note: inside `std::vec::Vec::<u64>::with_capacity_in` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:672:20: 672:61
= note: inside `std::vec::Vec::<u64>::with_capacity` at /home/runner/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:481:9: 481:49
= note: inside `num_bigint::biguint::convert::from_radix_digits_be` at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/num-bigint-0.4.4/src/biguint/convert.rs:122:20: 122:74
= note: inside `num_bigint::biguint::convert::<impl num_traits::Num for oxc_ast::BigUint>::from_str_radix` at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/num-bigint-0.4.4/src/biguint/convert.rs:269:13: 269:44
= note: inside `num_bigint::bigint::convert::<impl num_traits::Num for num_bigint::BigInt>::from_str_radix` at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/num-bigint-0.4.4/src/bigint/convert.rs:39:18: 39:51
= note: inside `num_bigint::BigInt::parse_bytes` at /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/num-bigint-0.4.4/src/bigint.rs:675:9: 675:41
note: inside `lexer::number::parse_big_int`
--> crates/oxc_parser/src/lexer/number.rs:97:5
|
97 | BigInt::parse_bytes(s.as_bytes(), radix).ok_or("invalid bigint")
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Found some arcane code from ratel, where we can use for our atom:
Here's where I'm up to with Atom
so far:
https://github.com/overlookmotel/oxc/tree/new-atom/crates/oxc_atom/src
It's heavily based on CompactStr
, and I've gone fairly deep figuring out how all their optimizations work (and also finding a couple of my own).
But thanks for link to Ratel's implementation - I'll see if anything we can use from it which is specific to strings in JS.
Working on Atom
has also thrown up some thoughts which are a bit broader about the parser/lexer overall, and would be good to resolve before locking Atom
's API. For various reasons, I think API should be more constrained than what I have now. I'll raise issues about these questions when I get time.
One question for now: How relaxed are you about breaking changes?
I mean: Is it better to work out API design thorougly before implementing, so the breaking changes happens all in one go? Or is it OK to merge an initial implementation (including breaking changes) and then modify API again later (more breaking changes)?
Also, a naming thing: Should we change the name from Atom
to just String
?
The name Atom
suggest that identical strings are deduplicated. That used to be that case, but it isn't any more, so the name is now misleading.
Also, a naming thing: Should we change the name from Atom to just String?
Can do since we are not 1.0. But there are many projects that dependents on oxc now, so we'll need to introduce some type alias and the #[deprecated]
macro to smooth out the transition.
Agree that a deprecation period is nice. Our JS engine uses Atom frankly a bit too much. We're just a hobby project so it doesn't really matter that we break though.
OK, understood. How about breaking changes to the methods of the type?
The simplest starting point is to have these methods to create an Atom/String:
fn new(s: &'a str) -> Atom<'a>;
fn new_in(s: &'a str, alloc: &'a Allocator) -> Atom<'a>;
// Limited to 16 bytes max
const fn new_const(s: &'static str) -> Atom<'static>;
But I think in future it'd be better to remove new()
and replace with:
fn new_from_source(SourceSlice<'a>) -> Atom<'a>;
SourceSlice
here is a string slice which is guaranteed to be part of the source code. Constraining Atoms to only be able to reference string content which is either in source code or stored in the arena will have benefits both in performance and functionality.
So my question is: Is it a problem to do the initial implementation first, and then have the breaking change of removing new()
later as a further breaking change? (NB deprecation would not be possible on that change, as it would be fundamental to Atom's internals).
Is it a problem to do the initial implementation first, and then have the breaking change of removing new() later as a further breaking change?
Not a problem. We'll need to keep a change log from now on :-)
OK great. I'm working on something else at the moment, but will come back to this soon.
See comment on #2101:
49% of the entire run time lexing
checker.ts
is spent inbumpalo::collections::string::String::push
.
If we can replace usage of Bumpalo's String
with something else, that'd produce an 10-15% speed-up in parsing some inputs.
Previous comment turned out to be erroneous. But discovered a couple of other things along the way which are relevant to this issue:
AutoCow
is inefficient and could be improved by pushing non-escaped chunks as string slices, rather than pushing many chars 1 by 1.String::push_str
is extremely slow for similar reasons - it pushes each byte of the string 1 by 1, rather than copying a big chunk all at once.
OXC appears to leak memory when source includes any strings over 24 bytes in length.
A demonstration here: https://github.com/overlookmotel/oxc/tree/leak-test
Checkout that branch, and
cargo run -p test_leak
.The problem
I believe the problem is this:
Atom
s.Atom
wrapsCompactString
from compact_str.CompactString
allocates to the heap if string is longer than 24 bytes.Atom
s may appear anywhere in the AST, likely within aBox
.Box
used for the arena has noDrop
impl.Program
is dropped,Atom
s don't get dropped, and the heap allocationsCompactString
made are not freed.In the demo above, I've added
impl Drop for Atom
which logs when anAtom
is dropped. It never gets called, demonstrating that the heap memory is leaked.Possible solutions
1. Add
impl Drop
toBox
This would be like
bumpalo::collections::Box
.Problem is that this would make dropping the AST more expensive, negating a major benefit of using an arena.
2. Track and drop
Atom
s manuallyStore pointers to all non-inline
Atoms
as they're created. ADrop
impl on the arena could then drop them when the arena is dropped.3. Replace
CompactString
Replace
CompactString
with a similar "small string" implementation which stores long strings in the bumpalo arena, instead of the general heap.As well as solving the leak, I believe this would also be more performant:
CompactString
is quite complicated, but OXC only uses a small fraction ofCompactString
's capabilities, so creating a custom small string implementation in OXC I don't think would be too tricky.In addition:
CompactString
is built as a replacement for std'sString
, and includes acapacity
field to allow resizing. But OXC'sAtom
is immutable, so thecapacity
field is not required. All that's needed is aBox<str>
equivalent.So a new "small string" type could lose the
capacity
field and be reduced in size to 16 bytes, rather than the current 24. Obviously, only strings up to 16 bytes could then be stored inline, but still most JS identifiers are under 16 characters, and if string content is stored in the arena, storing longer strings "out of line" has less performance penalty.Further optimizations
For identifers without escapes (pretty much all identifiers) and string literals without escape sequences (most strings),
Atom
could just be a pointer to the string in source code, rather than copying the string into the arena.Secondly, when lexing strings and identifiers containing escape sequences,
AutoCow
creates a string in the arena. That string can be pointed to byAtom
rather than copied to another location in the arena.In fact, escaped strings for some reason currently are stored in the arena three times, though I'm not sure why. Dump of arena contents after parsing
x = 'ABCDE\n';
: