node-formidable / formidable

The most used, flexible, fast and streaming parser for multipart form data. Supports uploading to serverless environments, AWS S3, Azure, GCP or the filesystem. Used in production.
MIT License
7k stars 680 forks source link

Files parsed by `IncomingForm().parse` are not automatically cleaned up or deleted. #967

Closed theGOTOguy closed 5 months ago

theGOTOguy commented 5 months ago

Support plan

Context

What are you trying to achieve or the steps to reproduce?

Parsed files are not cleaned up automatically and lead to a memory leak in production. I couldn't find this behavior documented anywhere, so I'm unsure whether it's expected behavior that needs to be documented or whether it's a bug and parsed files should be deleted automatically.

export default async function handler(req, res) {
  // snip...
  let form = new formidable.IncomingForm();
  try {
    let formfields = await new Promise(function (resolve, reject) {
      form.parse(req, function (err, fields, files) {
          if (files?.file) {
            jsonStream = fs.createReadStream(files.file.filepath).pipe(ndjson.parse());
            in_file = files.file.filepath;
          }

          resolve(fields);
      });
    });
    // snip...
  } finally {
    // My first attempt to fix the memory leak.  This didn't work.
    if (jsonStream) {
      jsonStream.destroy();
    }

    // Without what follows, a memory leak appears.
    if (in_file) {
      fs.unlink(in_file, (err) => {
        if (err) {
          // Whatever.  Maybe something else cleaned it up.
          return
        }
      })
    }
  }
}

What was the result you got?

Screenshot from 2024-01-20 16-39-58

The effect is best seen in the production graphs. You can see the container reboot due to an out-of-memory killer on the left, then you can see a reboot where I attempted to fix the leak by adding the jsonStream.destroy() logic, and then finally the noted fix where I use fs.unlink to delete the files emitted by form.parse.

What result did you expect?

I expected that the parsed files would be automatically cleaned up when form or formfields went out of scope. I have to use additional logic to separately delete them or a memory leak occurs.

GrosSacASac commented 5 months ago

Files are not removed

bensonk commented 5 months ago

Can you help me understand how keeping files on disk means it's necessary to keep the data in memory? Keeping files seems super reasonable. Keeping things in memory seems somewhat less reasonable.

theGOTOguy commented 5 months ago

@bensonk In this case, I believe the reason is that the files are stored in memory as an artifact of the host (Cloud Run).

@GrosSacASac Is this working as intended, then, or is this a bug? If it's working as intended, the examples should probably be updated so that they clean up the files when they're done, because if not I imagine that a lot of your users made the same mistake and are experiencing this same disk space or memory leak situation without knowing it.

GrosSacASac commented 5 months ago

If I upload a file I want it to stay

theGOTOguy commented 5 months ago

Ok, I see that the behavior is documented: https://github.com/node-formidable/formidable/blob/master/README.md#highlights

I was text searching for different terms. From the perspective of my pipeline which consumes but has no need to store the files, this behavior is surprising but I can see how it would make sense from a different perspective.

tunnckoCore commented 3 months ago

@theGOTOguy Parsed files are not cleaned up automatically and lead to a memory leak in production. I couldn't find this behavior documented anywhere, so I'm unsure whether it's expected behavior that needs to be documented or whether it's a bug and parsed files should be deleted automatically.

They are not automatically cleaned up, yeah. That's a user-land thing. There's not-writing-to-disk option and you should use it instead. (fileWriteStreamHandler)

I don't see it necessarily as memory leak or anything. But good point, maybe we should document it.