Closed erik-induro closed 10 months ago
Thanks for opening an issue @erik-induro. A few followup questions when you say 'pop' are you seeing an OOM problem? How large are these files you are reading and what is the typical number of lines you are reading? What are the resources of the underlying machine you are running this on?
Edit: Have you tried turning off validation when creating the read stream and see if that changes what you are seeing? When using a ranged read like your second example, validation is automatically disabled.
@ddelgrosso1 When I say 'pop' I just mean that it got it a weird state and wouldn't report telemetry anymore and was fairly unresponsive and new instances would spin up and I have to manually terminate it. We are running in app engine. Hard to describe but it isn't particularly relevant as the symptom could change depending on how people are using the library.
The machine has 4cpu and ~ 26 GB of ram but the machine dies at ~ 5 GB. We use this flag --max_old_space_size=24576
. The files are provided by clients so they can range from 1 line to 1 MM lines. Most of the files aren't that large. The issue is mostly caused by the quantity (~3k/night === 1GB increase).
I have not tried turning off validation. I can give that a shot.
@ddelgrosso1 I tested turning off validation with the original approach and still saw the memory grow and not release.
@erik-induro thanks for the update. I will dig into this and try and figure out what is going on.
Dug into this a bit today and was not successful in recreating the behavior. First, I created a file containing 750,000 lines and uploaded it to my bucket. I ran the script below which reads a random number of lines on each iteration. I collected metrics utilizing clinic doctor
. The heap allocation graph is below. Looks like it grows and drops back instead of growing in an unconstrained fashion.
@erik-induro is there anything else you might be able to provide to recreate the behavior you are seeing?
import { Storage } from "@google-cloud/storage";
import readline from 'readline';
const s = new Storage({projectId: 'my-project'});
const b = s.bucket('my-bucket');
const f = b.file('really-large-txt');
function readLines(file, numLines) {
const readStream = file.createReadStream();
const lineReader = readline.createInterface(readStream);
let count = 0;
const lines = [];
return new Promise((res, rej) => {
lineReader.on('line', line => {
count++;
lines.push(line);
if (count >= numLines) {
lineReader.close();
}
});
lineReader.on('error', e => {
readStream.destroy();
rej(e);
});
lineReader.on('close', () => {
readStream.destroy();
res(lines);
})
});
}
async function testMemory() {
for(let i = 0; i < 500; i++) {
console.log(`Iteration: ${i}`);
const rand = Math.floor(Math.random() * 750001);
const r = await readLines(f, rand);
}
console.log('Done!');
}
await testMemory();
@ddelgrosso1 thanks for taking a look at it. Also I like that clinic doctor
tool. I'm going to have to start using that :)
I copied your test and ran it locally. I am using a ~17k line file. I removed the rand
part and just set it to 2 since that is what I am doing in my use-case (I have to read enough to know how the file is delimited and if it has more than just a header).
I think it has less to do with the number of lines read and more to do with just opening and not reading to completion. I wonder if your spikes and drops are based on the lines being read into memory and that array being dumped and that is distracting from the slow growth that still appears in the baseline of the test.
I ran the test for 2k iterations (which took a hot minute btw...). Here is the chart from that run:
@erik-induro thanks for the update. I will modify and try your suggestion tomorrow and let you know what turns up.
I believe I know what is going on but still trying to definitively confirm with further tests. When .destroy()
is called on the only partially consumed stream it is causing an ERR_STREAM_PREMATURE_CLOSE
to get propagated. Since under the hood we are using a pipeline
this may be causing leaking event handlers.
So my previous comment was partially wrong. In this situation when the stream isn't fully consumed and is destroyed it does indeed cause an ERR_STREAM_PREMATURE_CLOSE
error to get propagated. The streams in the chain were getting destroyed correctly. However, I noticed that the underlying connections weren't getting closed. I should have noticed this from the clinic doctor
graph. When running 2000 iterations suddenly there were 2000 active handles.
I believe that this is the same issue raised here https://github.com/node-fetch/node-fetch/issues/1762. In order to fix this I added code to the pipeline handler in createReadStream
to destroy
the HTTP / HTTPS agent.
After adding the code to destroy the agent I ran another 2000 iteration test.
Now both memory and active handles look much more reasonable.
Environment details
@google-cloud/storage
version: 6.9.5Steps to reproduce
The following script reads the first N lines of a file and returns them. I rewrote this same function with multiple 3rd party libraries and other native approaches. The issues is not the
lineReader
.Even though I call
destroy()
it does not sufficiently clean up the resources. The heap size continues to increase on each execution till the memory on the server pops. I profiled this usingnode --inspect
and sawTyped Arrays
increase on each run and never decrease.My workaround was to specify a smaller portion of the file to pull in, read it to completion, and extract the lines I care about.
When I did this and profiled I saw that
Typed Arrays
stayed a constant size.Expected I expect
destroy()
to clean up resources for the stream when it is not read to the end so that it does not cause a memory leak.