Closed dtolnay closed 1 year ago
This is fixed on crates.io in regex 1.9.2
and regex-automata 0.3.5
.
Thank you greatly. I confirmed that Buck2 memory usage with regex 1.9.2 and globset 0.4.13 is at or slightly below what we had prior to the regex 1.9 update.
What version of regex are you using?
1.9.1
Describe the bug at a high level.
I am investigating a recent large memory usage regression in Buck2 that bisected to a
regex
update from 1.8.4 to 1.9.0, and persists in 1.9.1.There are some long-lived
GlobSet
objects in Buck2 corresponding to filepath patterns that buck has been configured to ignore. The real-world code is here: https://github.com/facebook/buck2/blob/4d300b08ec0c742952f4cdd9abf1c1055c7a75ab/app/buck2_common/src/ignores/ignore_set.rs.I instrumented the
RegexSet
created by the globset crate, and was able to create the following repro that shows +600MB being allocated and retained by theRegexSet
implementation for a seemingly pretty reasonably sized globset.What are the steps to reproduce the behavior?
cargo run --release
:What is the actual behavior?
Regex 1.8.4:
Regex 1.9.1:
What is the expected behavior?
I am interested in knowing whether this amount of allocation is intended for the above regex, or if this might be fixable.
Buck2 is not particularly sensitive to the total amount of allocation done, but very sensitive to steady-state retained memory (with widely used RegexSet objects sitting around) and high-water mark.