Open ziyoung opened 1 month ago
The memory allocated during the execution of the captures_iter method was not released.
This doesn't imply a leak. When you run a regex, it has to allocate space to run the search. This space is tied to the Regex
and it's cached in a thread safe manner. The amount of memory allocated scales proportionally to the maximum number of simultaneous requests. Not all memory allocated is kept, but some is.
When you clone the Regex
, you're creating a new pool of memory that gets dropped at the end of the function.
The expectation that all memory allocated during a search gets released is just wrong. A memory leak is something that happens when memory used grows without bound given an otherwise fixed set of resources.
Thanks. Cloning the original regex object does indeed avoid the growth of memory usage.
- for (i, caps) in REG.captures_iter(&text).enumerate() {
+ for (i, caps) in REG.clone().captures_iter(&text).enumerate() {
// Total: 1.3M
for (i, caps) in REG.clone().captures_iter(&text).enumerate() { // <--- 0.8M
// ...
}
According to the regex documentation, we should indeed avoid using regex across threads. So this is not a bug. Close this issue.
Using a regex across threads is fine. And cloning a regex for every search is almost certainly wrong and will lead to much much slower searches. Your peak memory usage should generally be the same in either case.
And that section in the regex docs is specifically about contention on shared mutable resources leading to slower searches. Not memory usage.
Using a regex across threads is fine.
If I execute memory_leak_demo more than 2000 times, I find that the memory usage keeps growing. Suppose I am writing an API request program. For each request, I need to call the REG.captures_iter method, and the memory usage will keep growing. This is unacceptable. |
Times | Total | Regex memory usage |
---|---|---|---|
500 | 7.5M | 7M | |
1000 | 18.1M | 17.6M | |
1500 | 31.7M | 31.2M | |
2000 | 41.3M | 40.8M | |
2250 | 51.8M | 51.3M | |
2500 | 60.4M | 59.9M |
Is there an upper limit to the memory occupied by regex?
After testing, the memory usage will not keep growing when executing the code snippet below (no regex cloning here). Can we say: if you use regex.captures_iter in an async function, wrap it in a synchronous blocking function?
+ fn get_tag_positions(src: &str) -> Vec<(usize, usize)> {
+ REG
+ .captures_iter(src)
+ .map(|caps| {
+ let m = caps.get(0).unwrap();
+ (m.start(), m.end())
+ })
+ .collect::<Vec<_>>()
+ }
// ...
- for (i, caps) in REG.captures_iter(&text).enumerate() {
+ for (i, (start, end)) in get_tag_positions(&text).into_iter().enumerate() {
I finally had a chance to take a quick peek at this, and your MRE is over-complicated. So I trimmed it down to this:
use std::time::Duration;
use once_cell::sync::Lazy;
use regex::Regex;
static REG: Lazy<Regex> = Lazy::new(|| Regex::new(r#"(Object\.\w+)"#).unwrap());
async fn memory_leak_demo() -> Vec<u8> {
let text = tokio::fs::read_to_string("vue.cjs.js").await.unwrap();
let mut buf = Vec::from(text.as_bytes());
for (i, caps) in REG.captures_iter(&text).enumerate() {
println!("process text: {}", i + 1);
tokio::time::sleep(Duration::from_millis(50)).await;
let m = caps.get(0).unwrap();
let start = m.start();
let end = m.end();
let text = &text[start..end];
buf.extend(text.as_bytes());
}
buf
}
async fn task() {
for _ in 0..200_000 {
let buf = memory_leak_demo().await;
let s = String::from_utf8(buf).unwrap_or_default();
println!("string length: {}", s.len());
}
}
#[tokio::main]
async fn main() {
tokio::spawn(async {
task().await;
})
.await
.unwrap();
println!("done");
}
With this Cargo.toml
:
[package]
name = "memory-leak-example"
version = "0.1.0"
edition = "2021"
[dependencies]
tokio = { version = "1", features = ["full", "tracing"] }
regex = "1.10.5"
once_cell = "1"
anyhow = "1"
Notice that task
now has a loop that runs 200_000
times. I ran this with cargo run -r
. I watched RES
memory usage in htop
and it remains fixed. No increases. That means there are no leaks.
Your measurement process is likely flawed. Diagnosing a memory leak is hard. But the fact that you're seeing differences between async and sync should be a tip-off that perhaps there is something about async that is causing higher peak memory usage for you. For example, if you have a bunch of tasks running simultaneously, perhaps with some paused because of your await
points, then running other tasks cannot access or use the memory used by regex
in the paused tasks. So those other tasks will have to create their own memory. In some cases, that memory is dropped, and in others it is put back into the pool for reuse. But invariant remains: memory usage scales proportionally with the number of simultaneous of the Regex
. So if you have a lot of simultaneous uses, it could look like a leak, but it isn't.
If you want to demonstrate a leak, then please provide a simple program that I can run where the resident memory usage grows without bound while otherwise using fixed resources as reported by a process monitor like htop
. The technique of running a memory profile is great for debugging leaks, but they aren't dispositive of leaks themselves.
I watched RES memory usage in htop and it remains fixed. No increases. That means there are no leaks.
I used htop
to monitor the memory usage and it showed no significant increase. The memory profile file here misled me.
But invariant remains: memory usage scales proportionally with the number of simultaneous of the Regex.
I wrote the following code to simulate a large number of regular matches in a short period of time. Putting regular matching in a synchronous statement reduces the number of regular simultaneous matches and reduces memory usage. I think the problem that comes up in my project has something to do with this. Of course, this is not a memory leak, it is just a matter of how it is used.
use std::{env, time::Duration};
use once_cell::sync::Lazy;
use regex::Regex;
use tokio::{fs, task::JoinSet};
static REG: Lazy<Regex> = Lazy::new(|| Regex::new(r#"(Object\.\w+)"#).unwrap());
async fn memory_leak_demo() -> Vec<u8> {
let file = fs::read("./vue.js").await.unwrap();
let mut buf = file.clone();
let text = String::from_utf8(file).unwrap();
if env::var("SYNC_REG_MATCH").is_ok() {
let positions = REG
.captures_iter(&text)
.map(|caps| {
let m = caps.get(0).unwrap();
(m.start(), m.end())
})
.collect::<Vec<_>>();
for (start, end) in positions {
tokio::time::sleep(Duration::from_millis(5)).await;
buf.extend(text[start..end].as_bytes());
}
} else {
for caps in REG.captures_iter(&text) {
tokio::time::sleep(Duration::from_millis(5)).await;
let m = caps.get(0).unwrap();
let start = m.start();
let end = m.end();
let text = &text[start..end];
buf.extend(text.as_bytes());
}
}
buf
}
#[tokio::main]
async fn main() {
{
tokio::spawn(async {
for i in 0..20 {
let mut set = JoinSet::new();
for _ in 0..10_000 {
set.spawn(memory_leak_demo());
}
while let Some(_res) = set.join_next().await {}
println!("task {} done", i);
tokio::time::sleep(Duration::from_millis(10)).await;
}
})
.await
.unwrap();
}
println!("use CTRL + C to exit");
tokio::signal::ctrl_c().await.unwrap();
}
./memory-leak-example:
memory usage: about 1099M
SYNC_REG_MATCH=1 ./memory-leak-example:
memory usage: about 90M
Yes, as I said, the likely reason here is that in your "synchronous" version, there are no await points while you're collecting the matches in your regex. So your async task cannot be stopped while you're consuming captures_iter
. But in your asynchronous task, you have await points while traversing captures_iter
. If a task is stopped during iteration and another starts in its place, then the second task can't use the same memory that the first task was using because it's still being borrowed by captures_iter
. Once both tasks complete, the memory is either dropped or put back in the pool.
So I think the lesson here is to be careful of introducing await points while consuming a regex match iterator, because a regex match iterator has memory associated with it that can't be shared with another iterator so long as it's live.
I've been debugging a memory "leak" that I only finally tracked down to this allocation of memory.
The application I maintain (the rust port of the matomo device detector) has tens of thousands of regexes and they must be run concurrently, but locking every individual regex is not great, and cloning before every match is not great, trying to somehow reinitialize a regex through Lazy periodically might be doable, somehow (maybe), limiting the concurrency of queries to the system is really not great, but I'm really not sure what else to do, as the memory gradually creeps up over the span of weeks.
Maybe if the cache statistics on the amount of memory stored was accessible somehow through the regex and could be cleared occasionally, somehow?
@mindreader It might be worth pursuing a compaction strategy: https://github.com/rust-lang/regex/blob/ab88aa5c6824ebe7c4b4c72fe5191681783b3a68/regex-automata/src/util/pool.rs#L318-L320
Or alternatively, we could be more aggressive about throwing away scratch memory here: https://github.com/rust-lang/regex/blob/ab88aa5c6824ebe7c4b4c72fe5191681783b3a68/regex-automata/src/util/pool.rs#L607-L625
Today, we only throw memory away when there is too much contention on the shared resource. But maybe there should be other cases where we throw it away.
It seems hard to pick a correct heuristic for this. And adding a method to "clear cached memory because it might get too big over time" seems pretty unfortunate.
I'm going to re-open this and think about it a bit more.
Thanks for pointing out the relevant code. I will definitely take a closer look.
Another point to consider is that the memory increase should only occur if you're iterating over matches, and within the handling of each element there are await points. If there aren't any await points, there there isn't any opportunity for the scratch memory to be held by a task and not sent back to the pool.
This happens because memory is retrieved at the start of iteration and not put back until the iterator is dropped. This is done to avoid needing to retrieve this memory for each search. But perhaps this is the wrong trade-off to make when each iteration has an await point.
Everything you say tracks, it shouldn't lose memory at all, but it seems to in my case. But I need to find a truly minimal test case before I can even begin to understand why.
I had no idea how much effort went into this. While I intend to experiment with this, I have little hope that I will discover something you haven't, so feel free to close this, and if I do by miracle find something workable I promise I will respond or open a PR.
Do you have await points in your captures_iter loop?
My case isn't exactly like the op, I'm not capturing over await points per se, but rather operating on single super large collection of Regex's shared via an Arc in separate threads. There are actually no awaits in code, just in the outer loop, no data shared between loops. That means that every regex should have allocated its own scratch basically, and that scratch should be reclaimed except for the "owner" thread.
But the results from the tests I'm running don't really make sense to me. I have far too many variables in my code to consider this a worthwhile bug report, please I beg, just let me get some spare time to try and remove everything that could be complicating this before taking anything I say seriously.
Aye, understood. Thank you for investigating!
I had some time this weekend to run a lot of tests and now I understand how it all works. It isn't a loss of memory, it is just that over time if your application uses a lot of different threads, you will accrue 8 full cache lines on each regex and they will never go away.
It isn't a matter of lots of concurrency, merely accessing via many threads gradually will eventually fill up the cache, even if each one is accessed one thread at a time. That's why this happens to me over a span of weeks.
The following change was enough to reduce my memory usage in real world scenarios by 50%+ while increasing cpu usage by only 15%, a tradeoff I would happily accept, except for the difficulty of exposing that level of configurability without forking.
diff --git a/regex-automata/src/util/pool.rs b/regex-automata/src/util/pool.rs
index d90d4ec..be5142e 100644
--- a/regex-automata/src/util/pool.rs
+++ b/regex-automata/src/util/pool.rs
@@ -328,7 +328,7 @@ mod inner {
///
/// See this issue for more context and discussion:
/// https://github.com/rust-lang/regex/issues/934
- const MAX_POOL_STACKS: usize = 8;
+ const MAX_POOL_STACKS: usize = 1;
thread_local!(
/// A thread local used to assign an ID to a thread.
@@ -628,7 +628,7 @@ mod inner {
/// Create a guard that represents the special owned T.
#[inline]
fn guard_owned(&self, caller: usize) -> PoolGuard<'_, T, F> {
- PoolGuard { pool: self, value: Err(caller), discard: false }
+ PoolGuard { pool: self, value: Err(caller), discard: true }
}
So we get rid of the owner thread cache because it is basically used once and never again in this scenario. And we remove all but the first cache since it will almost definitely never need more than one at a time, and I'm willing to allocate in the case that we do.
Since memory can only be reclaimed upon use of the regex, that greatly limits the ways in which you could optimize this. I thought of maybe a cache where each cache line was thread independent but it would take some real time to validate this theory and I have a feeling it would definitely hurt some benchmarks.
I had some time this weekend to run a lot of tests and now I understand how it all works. It isn't a loss of memory, it is just that over time if your application uses a lot of different threads, you will accrue 8 full cache lines on each regex and they will never go away.
Ah I see. That's interesting. I do still wonder if number of threads has a role to play here though, since each cache line is a stack of reusable allocations. As long as there isn't any contention, these stacks can continue to grow I think. But I suppose growth is only possible when there is actual concurrency happening.
Some thoughts:
regex
, I would prefer not to, of if so, something simplified. For regex-automata
, I'm generally more open to exposing more knobs.It is an interesting problem. I will give it a shot if I can get enough spare time. I can't honestly give super good odds of success.
What version of regex are you using?
Latest 1.10.5
Describe the bug at a high level.
Calling the static regular instance's
caputres_iter
in anasync
function may cause memory leak.What are the steps to reproduce the behavior?
What is the actual behavior?
The memory allocated during the execution of the
captures_iter
method was not released. I dump the memory profile file during the program running. https://github.com/ziyoung/memory-leak-example?tab=readme-ov-file#regex-memory-issueIf I change my code like follwings, there's no memory leak. https://github.com/ziyoung/memory-leak-example/pull/1/files#diff-42cb6807ad74b3e201c5a7ca98b911c5fa08380e942be6e4ac5807f8377f87fc
What is the expected behavior?
There's no memory leak.