mscdex / busboy

A streaming parser for HTML form data for node.js
MIT License
2.86k stars 213 forks source link

Memory leaks in all the examples? #241

Closed manast closed 3 years ago

manast commented 3 years ago

This is probably more of a generic problem on most javascript libraries out there but I thought it is important to point it out. Namely that the examples in the documentation all creates memory leaks. For instance:

var http = require('http'),
    path = require('path'),
    os = require('os'),
    fs = require('fs');

var Busboy = require('busboy');

http.createServer(function(req, res) {
  if (req.method === 'POST') {
    var busboy = new Busboy({ headers: req.headers });
    busboy.on('file', function(fieldname, file, filename, encoding, mimetype) {
      var saveTo = path.join(os.tmpDir(), path.basename(fieldname));
      file.pipe(fs.createWriteStream(saveTo));
    });
    busboy.on('finish', function() {
      res.writeHead(200, { 'Connection': 'close' });
      res.end("That's all folks!");
    });
    return req.pipe(busboy);
  }
  res.writeHead(404);
  res.end();
}).listen(8000, function() {
  console.log('Listening for requests');
});

The problem is that a new instance of the Busboy class is instantiated everytime a request is handled. This instance sets a couple of event listeners ('file' and 'finish'), however these listeners are never removed, therefore the GC will not be able to free the busboy object ever since the listeners will keep listening for ever... An improvement here would be to remove the listeners in the 'finish' event handler, however this assumes that this event will always be triggered, but this may not be always the case. We could add a new event listener that listens for the 'error' event, and hopefully either 'error' or 'finish' will be called, however this needs to be guaranteed by the library, otherwise we are just assuming it and potential memory leaks still be there.

I know I am very late to the party but anyway, this is important in my opinion.

manast commented 3 years ago

Actually after giving more thought to it, maybe there are no memory leaks at all, and I did not make a proper analysis. For instance the reference that keeps the busboy instance in memory is:

req.pipe(busboy)

If req gets unreferenced when the stream has completed/failed, then the GC should be able to collect req as well as all the stuff req is referencing including the busboy instance and all their listeners.

I apologize if this is the case but I could not find any information about this in node's documentation.

mscdex commented 3 years ago

There is no problem with leaving event listeners on an emitter because the busboy is limited to the scope of the HTTP request handler. Eventually the GC will collect all of those resources: busboy, req, res, etc. Setting those variables explicitly to undefined or similar is unnecessary due to how GC works in JavaScript.

If your process is crashing with an OOM error, then the culprit is going to lie elsewhere as there is no leak with the code you've provided. If it's not crashing with an OOM error, then most likely what you're seeing is the laziness of the GC. It doesn't free memory ASAP. You can however force a GC to occur by passing '--expose-gc' via the command line and then calling global.gc() whenever you want. However, forcing the garbage collector to run is not what you should be doing. If your concern is total memory usage, there are ways to limit V8's total memory usage.