Closed Informate closed 2 months ago
Hi! Can you provide a minimally reproducible example?
My code is rather long and complex, but the solution baffles me. This is the second time I have had this problem (First time on a similar task some year ago, solved in the same way). The really terrible thing is that the solution solves the problem! (Otherwise I would think it was something else).
My code is rather long and complex, but the solution baffles me.
Can you find a minimal reproduction in that? It'd really help demonstrate the issue.
It comes and goes... It have to be related to the HTTP content given to the RegExp, I have no other explanations. Anyway the problem is present since years.
If you've resolved the issue, feel free to self-close.
No! I did not solved the issues... The problem is present since years, 2° time I found it on same task: string from RegExp with text from HTTP request.
The string generates a memory-leakages as it brings with it some shadowed data given by the RegExp module.
Okay, a mininal reproducible example is still important to help diagnose the issue.
I believe it is more related to v8
instead of Node.js
and need to be reported to v8
in order to solve the problem.
string slicing often keep reference to the parent string instead of copying to new buffer. RegExp result should be string slicing.
Reference https://iliazeus.github.io/articles/js-string-optimizations-en/#sliced
Sorry, I do not know where to report it. Anyway in past I have already found and reported a bug in the PCRE RegExp engine under PHP. It could also be the RegExp engine. I have reported this here because I do not know what to do best, and because is the second time I found the same problem/bug.
Reference https://iliazeus.github.io/articles/js-string-optimizations-en/#sliced
Yes, it is exactly what this post is talking about (Chapter: Unintended side effects, or: how to scrub a string), but it ends up crushing the nodejs process instead of freeing the memory and reorganizing the strings in the heap.
@nodejs/v8 Is this a known limitation
?
Maybe they could provide an official scrubString method better than in and out from JSON?
It is unlikely be fixed, you can track the problem in below link.
If you are using RegExp.match
, you should use RegExp.exec
instead of RegExp.match
to prevent another memory leak.
I think that a method to finalize the string, a method in the garbage collector and an automatic invocation of the garbage collector method (when the memory allocation fails for OoM before the heap crashes and then probably the nodejs process is terminated) should be present in the nodejs environment.
As of ChatGPT, a solution could be to use a custom string finalizer function:
let newStr = Buffer.from(str, 'utf-8').toString('utf-8');
The garbage collector method could for example check for big holes of unused memory in the original string buffers starting from the longest one, and then possibly moving (collapsing) all the string references in the string composition tree hierarchy system. (A long task maybe bloking for some ms as old GC but only on request or at OoM).
Okay, a mininal reproducible example is still important to help diagnose the issue.
I posted the example as requested.
Note that Node.js 12.x is End-of-Life since 2022-04-30 (more than 2 years ago at the time of writing), meaning it's no longer supported and issues reported on that version won't be fixed. I would recommend updating to a supported version (e.g. Node.js 22.x or 20.x).
As @aduh95 mentioned, can you reproduce in v18, v20, or v22?
Additionally, could you strip your example to be minimal? Right now it has a lot of different steps, which could interfere with the problem at hand, can you make it minimal?
I have updated to v22.4.0!
Did that resolve the issue?
Did that resolve the issue?
No! It would need a GC update. But as it degrades performance it could be an optional GC feature.
Additionally, could you strip your example to be minimal? Right now it has a lot of different steps, which could interfere with the problem at hand, can you make it minimal?
Again, a minimally reproducible example is crucial when filing a bug report. Without it, there's no way to tell what exactly is causing a certain issue. Currently, your example relies on several functions, so it's hard to tell what's going wrong.
If you're unable to provide such examples, then I'm going to close this issue, as it's not reproducible.
I though minimal reproducible example in Node.js
is referring to only use the Node.js
components.
How many steps or extracted function do not defect it is not a minimal reproducible example.
Personally, I don't think eagerly close the issue with example would helps anything.
The root cause of issue is as simple as v8
string optimization is not good enough to strip the ram usage for referenced string view.
It is not actionable in Node.js
side but the issue itself still persist.
I believe a minimally reproducible example refers to an example leaving nothing up to chance, such as fetching arbitrary data (in this case Wikipedia).
Because this issue relies on RegExp, but http is used, it's hard to tell which part could cause the issue at hand.
I might be wrong tho.
Additionally, if it's not actionable on the Node.js side, than there's nothing we can do about it.
I believe a minimally reproducible example refers to an example leaving nothing up to chance, such as fetching arbitrary data (in this case Wikipedia).
Because this issue relies on RegExp, but http is used, it's hard to tell which part could cause the issue at hand.
I might be wrong tho.
Additionally, if it's not actionable on the Node.js side, than there's nothing we can do about it.
@RedYetiDev I got your point, but this is a minimal example not relying on chance.
I give you a better explanation of what is happening:
We need:
1 - something loading new data from buffers into strings: https get load new data into the buffers. 2 - something cutting the strings and keeping just a small part of the string. 3 - repeat until memory saturation.
I will give you a minimal example as soon as I can if this is too complex. But this is working and not on chanches.
@RedYetiDev I have provided the minimal example working for me as requested. I've tested it on node v12 and node v22. Both killed by the OoM error.
Let me know if now is it ok!
AFAICT there's nothing that can be done from the Node.js side, as this is controlled by V8, but I would like @nodejs/v8 to verify that information.
AFAICT there's nothing that can be done from the Node.js side, as this is controlled by V8, but I would like @nodejs/v8 to verify that information.
So where should we post the issue?
I'm not sure yet, let's wait for @nodejs/v8 to weigh in.
The code in the OP grows an array in an infinite loop, I would expect the process to eventually run out of memory – so if that what’s happening, I’d say it’s working as expected.
The code in the OP grows an array in an infinite loop, I would expect the process to eventually run out of memory – so if that what’s happening, I’d say it’s working as expected.
No. It pushes an object of 16 bytes and loose 1Mb of memory! That's the memory leakage! This is the reason you see the memory growing so fast in the example. At each cycle you will see an increase of at last 1mb of memory in the heap.
Even more: So you could run the cycle just 500'000 times on your machine?
You are losing 1MB of memory, as you are creating a 1MB buffer each iteration. This seems to be working as intended. To further show that I don't think there is any memory leakage, I have made a demo script:
function formatMemoryUsage(bytes) {
return `${(bytes / 1024 / 1024).toFixed(2)} MB`;
}
function logMemoryUsage() {
const usage = process.memoryUsage();
console.log(`Memory Usage: Heap Total: ${formatMemoryUsage(usage.heapTotal)}, Heap Used: ${formatMemoryUsage(usage.heapUsed)}`);
}
// Simulate string slicing operations
function simulateStringSlicing() {
let str = 'a'.repeat(10 * 1024 * 1024); // 10 MB string
let slicedArray = [];
// Simulate slicing
for (let i = 0; i < 1000; i++) {
let sliced = str.slice(i, i + Math.floor(Math.random() * 100));
slicedArray.push(sliced);
}
}
setInterval(() => {
simulateStringSlicing();
logMemoryUsage();
}, 1000);
After letting it run for some time, I saw no abnormal memory changes:
Memory Usage: Heap Total: 14.20 MB, Heap Used: 12.77 MB
Memory Usage: Heap Total: 15.70 MB, Heap Used: 13.16 MB
Memory Usage: Heap Total: 16.20 MB, Heap Used: 13.18 MB
Memory Usage: Heap Total: 15.20 MB, Heap Used: 13.16 MB
Memory Usage: Heap Total: 15.20 MB, Heap Used: 13.16 MB
Memory Usage: Heap Total: 15.20 MB, Heap Used: 13.17 MB
Memory Usage: Heap Total: 15.20 MB, Heap Used: 13.17 MB
Memory Usage: Heap Total: 15.20 MB, Heap Used: 13.18 MB
Memory Usage: Heap Total: 15.20 MB, Heap Used: 13.19 MB
Memory Usage: Heap Total: 15.20 MB, Heap Used: 13.25 MB
Memory Usage: Heap Total: 15.20 MB, Heap Used: 13.25 MB
Memory Usage: Heap Total: 14.45 MB, Heap Used: 13.07 MB
Memory Usage: Heap Total: 14.45 MB, Heap Used: 13.07 MB
If you want to try it in an isolated system, I've hosted the code on RunKit.
I don't think there is any memory leakage, I have made a demo script
Your demo shows unclear understanding of the problem. Try the below one, you can see the occupied memory never released which is the memory of the parent string, not the sliced string.
In expected situation, it should release the memory of the unwanted parts of parent string and only occupy the memory of sliced string. Because it keeps in references and never dispose.
function formatMemoryUsage(bytes) {
return `${(bytes / 1024 / 1024).toFixed(2)} MB`;
}
function printMemoryUsage() {
const usage = process.memoryUsage()
console.log(
`Memory Usage: RSS: %s, Heap Total: %s, Heap Used: %s, External: %s, ArrayBuffers: %s`,
formatMemoryUsage(usage.rss),
formatMemoryUsage(usage.heapTotal),
formatMemoryUsage(usage.heapUsed),
formatMemoryUsage(usage.external),
formatMemoryUsage(usage.arrayBuffers),
);
}
// we use an array to store sliced reference
const references = []
// Simulate string slicing operations
function simulateStringSlicing() {
let str = 'a'.repeat(10 * 1024 * 1024); // 10 MB string
// we take random small part of the string
const start = Math.floor(Math.random() * 100)
const end = Math.floor(Math.random() * 1000)
references.push(str.slice(start, end))
}
// run for 10 times
printMemoryUsage()
for(let i = 0; i < 10; i++) {
simulateStringSlicing()
}
printMemoryUsage()
// check for memory usage
setInterval(printMemoryUsage, 1000);
Sorry for the misunderstanding, I indeed didn't understand that part of the question.
If I understand correctly now, when slicing a string, the full string removed from memory so long as the sliced reference still exists.
I've hidden my comment to help not de-rail the conversation.
If I understand correctly now, when slicing a string, the full string removed from memory so long as the sliced reference still exists.
Yes, as long as you sliced a string. It keeps the reference of the original string and never remove the unwanted part from memory.
Even the code shows no where to use it again.
You may try to unreference the original string by str = null
after the array push. It stills keep in memory.
Thanks. I apologize for my misunderstanding. The String
class is inherited directly from V8. As @climba03003 mentioned earlier, this issue is tracked at https://issues.chromium.org/issues/41480525.
This issue has been previously posted at https://github.com/nodejs/node/issues/25846, and, at the end-of-the day, there's not much that can be done on the Node.js side of things. I suggest you comment on that issue with how this affects you, and try to get Google's V8 team to work on this issue.
This issue has existed for over a decade, and I'm not sure if a patch is anywhere soon.
Version
v22.4.0
Platform
Subsystem
Strings, Garbage Collector
What steps will reproduce the bug?
At OoM in some example just 1%-2% of the heap is really used, a 5%-10% waste could be usual. In the following at the edge example a memory waste of 99.9984% should be achieved.
How often does it reproduce? Is there a required condition?
At OoM.
What is the expected behavior? Why is that the expected behavior?
Recover from the error or treat it before it occurs. Java and JavaScript are programming languages that manage memory and for which the programmer does not need to keep track of memory. Instead, in this situation, the programmer needs to keep track of memory and handle the situation manually.
What do you see instead?
Keeping the string resulting from RegExp operations generates a Memory Leakages. The RegExp extracts short parts from an http document retrived from https. The bug was already present some year ago (And never solved I suppose). Cleaning the string will solve the memory leakages (this will leave JSON related leakages):
or better (without other leaksges):
The problem is not solved by other simple operations as .slice(0) or .toString().
Additional information
The Strings subsystem waste memory and the garbage collector is not able to recover from the heap crash.
To solve the problem and recover the garbage collector could require memory maps.
Random Sampling operations could be possible in the standard GC scheduling and frequency could be increased on matching. Random Sampling with just 1 sample a time would introduce a very limited overhead on GC. The sampling test would require a full GC cycle on the old space. Exploiting a fractal partitioning algorithm ( 1° middle of the space, 2° middle of first half, 3° middle of second half, 4° middel of fisrt quarter, ...) could reduce random sampling overhead. When the sampling test fails, succesive iterations up to the completion of GC cycle could be used to propose a new candidate (continuing the partial test on the next candidate). With a 10% memory waste the 1 sample random sampling should match in approximately 11 full GC cycles.
Running the new GC task in a separate low-priority GC thread on the old heap space would require a limited ammount of resources. For example using 32 sample from Random Sampling or Fractal Partitioning with free space bounds checks in the task would require [ 32 sizeof(memory_address) 3 bytes = 768 ] less than 1kb, plus a stack as deep as the string hierarchy tree structure multiplied by a costand that could be as small as 2,3 or just 1 (inlineing the next strings to checks in an array).