thejoshwolfe / yazl

yet another zip library for node
MIT License
341 stars 45 forks source link

Memory leak! #52

Closed taoqf closed 5 years ago

taoqf commented 5 years ago

It seems the memory is leaking, especially when called addBuffer.

thejoshwolfe commented 5 years ago

Can you give some example code that demonstrates the memory leak?

taoqf commented 5 years ago

Usage in readme will do.

thejoshwolfe commented 5 years ago

When i run the usage example, it does not leak memory, because the node process does a finite sequence of things and then exits. There would need to be some kind of long running server or loop in order to observe a memory leak.

taoqf commented 5 years ago

You can use setInterval, and watch the memory usage.

andrewrk commented 5 years ago

@taoqf did you do that? In that case post your results. If you did not do that, why do you think there is a memory leak?

thejoshwolfe commented 5 years ago

I ran this code:

var yazl = require("yazl");
var fs = require("fs");

var iterations = 0;
function iterate() {
  var zipfile = new yazl.ZipFile();
  zipfile.outputStream.pipe(fs.createWriteStream("output.zip")).on("close", function() {
    iterations++;
    if (/^[0-9]0+$/.test(iterations.toString())) {
      console.log(iterations);
    }
  });
  zipfile.addBuffer(Buffer.from("hello"), "hello.txt");
  zipfile.end();
}

setInterval(iterate, 0);

it's reached 200000 iterations and i do not see any noticeable increase in memory usage over time.

Please reopen this issue with actual code that demonstrates the problem.

taoqf commented 5 years ago

Maybe you should try a large file. I can give a test, run it and you will see the memory increasing.

const { ZipFile } = require('yazl');
const { readFileSync } = require('fs');

let time = 0;
function zip(...files) {
    console.debug(++time);
    return new Promise((resolve, _reject) => {
        const zipfile = new ZipFile();
        files.forEach((f) => {
            zipfile.addBuffer(f.file, f.path);
        });
        const data = [];
        zipfile.outputStream.on('data', (chunk) => {
            data.push(chunk);
        });
        zipfile.outputStream.on('end', () => {
            resolve(Buffer.concat(data));
        });
        zipfile.end();
    });
}

const file = readFileSync('./a.pdf');
setInterval(async () => {
    await zip({
        file,
        path: './a.pdf'
    });
}, 0);

a.pdf

It has been so long time, and I turned to jszip, it works fine.

taoqf commented 5 years ago

@andrewrk @thejoshwolfe

thejoshwolfe commented 5 years ago

I think I found the source of the memory leak in that example: you're starting too many parallel operations at once.

I added some logging to your code, and change the setInterval time to 200:

const { ZipFile } = require('yazl');
const { readFileSync } = require('fs');

let started = 0;
let ended = 0;
function zip(...files) {
        started++;
        console.debug("started:", started, ", ended:", ended, ", in progress:", started - ended);
        return new Promise((resolve, _reject) => {
                const zipfile = new ZipFile();
                files.forEach((f) => {
                        zipfile.addBuffer(f.file, f.path);
                });
                const data = [];
                zipfile.outputStream.on('data', (chunk) => {
                        data.push(chunk);
                });
                zipfile.outputStream.on('end', () => {
                        ended++;
                        resolve(Buffer.concat(data));
                });
                zipfile.end();
        });
}

const file = readFileSync('./a.pdf');
setInterval(async () => {
        await zip({
                file,
                path: './a.pdf'
        });
}, 200);

Here is some example output with the 200 millisecond delay:

started: 1 , ended: 0 , in progress: 1
started: 2 , ended: 0 , in progress: 2
started: 3 , ended: 0 , in progress: 3
started: 4 , ended: 2 , in progress: 2
started: 5 , ended: 3 , in progress: 2
started: 6 , ended: 4 , in progress: 2
started: 7 , ended: 5 , in progress: 2
started: 8 , ended: 6 , in progress: 2
started: 9 , ended: 7 , in progress: 2
started: 10 , ended: 8 , in progress: 2
started: 11 , ended: 9 , in progress: 2
started: 12 , ended: 10 , in progress: 2
started: 13 , ended: 11 , in progress: 2
started: 14 , ended: 12 , in progress: 2
started: 15 , ended: 13 , in progress: 2
started: 16 , ended: 14 , in progress: 2
started: 17 , ended: 15 , in progress: 2
started: 18 , ended: 16 , in progress: 2

Now here's what happens when i change the 200 millisecond delay to 0 like how your code has:

started: 1 , ended: 0 , in progress: 1
started: 2 , ended: 0 , in progress: 2
started: 3 , ended: 0 , in progress: 3
started: 4 , ended: 0 , in progress: 4
started: 5 , ended: 0 , in progress: 5
started: 6 , ended: 0 , in progress: 6
started: 7 , ended: 0 , in progress: 7
started: 8 , ended: 0 , in progress: 8
started: 9 , ended: 0 , in progress: 9
started: 10 , ended: 0 , in progress: 10
started: 11 , ended: 0 , in progress: 11
started: 12 , ended: 0 , in progress: 12
started: 13 , ended: 0 , in progress: 13
started: 14 , ended: 0 , in progress: 14
started: 15 , ended: 0 , in progress: 15
started: 16 , ended: 0 , in progress: 16
started: 17 , ended: 0 , in progress: 17
started: 18 , ended: 0 , in progress: 18

All those in-progress operations use up memory, and that's not really yazl's fault. It could be that there's another bug that really does cause a memory leak, but this example doesn't show that. When I ran the 200 millisecond delay code above, I did not observe any increasing ram usage over time.