mscdex / busboy

A streaming parser for HTML form data for node.js
MIT License
2.84k stars 213 forks source link

Filestreams never timeout #308

Closed UnlimitedBytes closed 2 years ago

UnlimitedBytes commented 2 years ago

Important notice I know the issue I'm going to describe in this request is fixable in userland. Anyway it would be nice to have busboy fix this as the ressources we're talking about are created by busboy and so should busboy manage cleaning up the produced mess instead of shipping this to the users.

Describe the bug Filestreams never get canceled when they're not receiving anymore data. This leads to a large amount of unneeded streams open.

Reproduction For the simplest example we take the little server from README.md and modify it a little so requests that take too long are canceled.

// server.js
import http from "http";
import busboy from "busboy";

http
  .createServer((request, response) => {
    if (request.method.toLowerCase() === "post") {
      const bb = busboy({ headers: request.headers });

      bb.on("error", (error) => {
        console.error(error);
      });

      bb.on("field", (name, value, info) => {
        console.log(`Field [${name}]: value: %j`, value);
      });

      bb.on("file", (name, stream, info) => {
        const { filename, encoding, mimeType } = info;
        console.log(
          `File [${name}]: filename: %j, encoding: %j, mimeType: %j`,
          filename,
          encoding,
          mimeType
        );

        stream.on("data", (data) => {
          console.log(`File [${name}] got ${data.length} bytes`);
        });

        stream.on("close", () => {
          console.log(`File [${name}] done`);
        });
      });

      bb.on("end", () => {
        console.log("Done parsing form!");
        response.writeHead(303, { Connection: "close", Location: "/" });
        response.end();
      });

      request.socket.setTimeout(15000, () => {
        console.log('Request timeout');
        request.socket.destroy()
      })

      request.pipe(bb);
    } else {
      response.writeHead(200, { "Content-Type": "text/html; charset=utf-8" });
      response.write(
        `
        <form action="/" method="POST" enctype="multipart/form-data">
            <input type="file" name="file" /><br />
            <button type="submit">Upload</button>
        </form>
        `.trim()
      );
      response.end();
    }
  })
  .on("listening", () => {
    console.log("HTTP-Server is running.");
  })
  .listen(3001, "127.0.0.1");

And now we're sending a really slow request to the server

# generate some random data
head -c 2M </dev/urandom >somedata.bin

# send a really slow request
curl -F file=@somedata.bin --limit-rate 1K http://127.0.0.1:3001/

What we would expect to see is:

HTTP-Server is running.
File [file]: filename: "somedata.bin", encoding: "7bit", mimeType: "application/octet-stream"
File [file] got 65380 bytes
Request timeout
"Done parsing form!" or "ERROR:  ..."

which would indicate that the socket is closed and all resources are released. But what we actually see is:

HTTP-Server is running.
File [file]: filename: "somedata.bin", encoding: "7bit", mimeType: "application/octet-stream"
File [file] got 65380 bytes
Request timeout

which doesn't show any ending of the file stream that we opened. This so far can be considered a feature instead of a bug supporting slow connections (which is dangerous on its own tho due to a variation of Slowloris

But even if we are 100% sure no more data will arrive from the client (e.g. request.on('aborted', () => {}) busboy will not destroy the filestream. So if we send our slow request again and cancel it 2 seconds after we still have an open stream waiting for more data on the server side while the client is disconnected. Repeat this in a little script around a million times and the webserver crashes due to out of memory.

System Info

System:
    OS: Windows 10 10.0.19044
    CPU: (8) x64 Intel(R) Core(TM) i7-9700KF CPU @ 3.60GHz
    Memory: 11.25 GB / 31.92 GB
Binaries:
    Node: 18.4.0 - C:\Program Files\nodejs\node.EXE
    Yarn: 1.22.19 - ~\AppData\Roaming\npm\yarn.CMD
    npm: 8.13.1 - ~\AppData\Roaming\npm\npm.CMD

Severity annoyance

mscdex commented 2 years ago

This would ultimately be a node.js core issue as the request object is not properly destroyed. If it was, an error would be emitted, which would cause the pipe to break, which would cause busboy's destroy handler to get called, which would cause an 'error' to be emitted appropriately.

mscdex commented 2 years ago

FWIW a non-busboy test case would be:

const http = require('http');
const { Writable } = require('stream');

http
  .createServer((request, response) => {
    if (request.method.toLowerCase() === "post") {
      const writer = new Writable({
        write(data, enc, cb) {
          console.log('writer write()', data.length);
          cb();
        },
        final(cb) {
          console.log('writer final()');
          cb();
        },
        destroy(err, cb) {
          console.log('writer destroy()', err);
          cb();
        }
      });

      request.socket.setTimeout(15000, () => {
        console.log('Request timeout');
        request.socket.destroy();
      })

      request.pipe(writer);
      return;
    }
    response.end();
  })
  .on("listening", () => {
    console.log("HTTP-Server is running.");
  })
  .listen(3001, "127.0.0.1");

You will see that only write() is ever called.

mscdex commented 2 years ago

Interestingly require('stream').pipeline() makes things work as intended. I'm guessing it's because it listens for the 'aborted' event that gets emitted on request, whereas request.pipe() does not.

UnlimitedBytes commented 2 years ago

Hey, thanks for the fast response!

It is fixable by calling busboy's destroy like this

// server.js
import http from "http";
import busboy from "busboy";

http
  .createServer((request, response) => {
    if (request.method.toLowerCase() === "post") {
      const bb = busboy({ headers: request.headers });

      bb.on("error", (error) => {
        console.error('Error while parsing form', error);
      });

      bb.on("field", (name, value, info) => {
        console.log(`Field [${name}]: value: %j`, value);
      });

      bb.on("file", (name, stream, info) => {
        const { filename, encoding, mimeType } = info;
        console.log(
          `File [${name}]: filename: %j, encoding: %j, mimeType: %j`,
          filename,
          encoding,
          mimeType
        );

        stream.on('error', (error) => {
          console.error('Error while reading file stream', error)
        })

        stream.on("data", (data) => {
          console.log(`File [${name}] got ${data.length} bytes`);
        });

        stream.on("close", () => {
          console.log(`File [${name}] done`);
        });
      });

      bb.on("end", () => {
        console.log("Done parsing form!");
        response.writeHead(303, { Connection: "close", Location: "/" });
        response.end();
      });

      request.setTimeout(5000)

      request.on('aborted', () => {
        console.log('Request aborted');
        request.destroy();
        bb.destroy();
      })

      request.pipe(bb);
    } else {
      response.writeHead(200, { "Content-Type": "text/html; charset=utf-8" });
      response.write(
        `
        <form action="/" method="POST" enctype="multipart/form-data">
            <input type="file" name="file" /><br />
            <button type="submit">Upload</button>
        </form>
        `.trim()
      );
      response.end();
    }
  })
  .on("listening", () => {
    console.log("HTTP-Server is running.");
  })
  .listen(3001, "127.0.0.1");

but it would be nice if busboy would catch destruction of the parent stream as it is clear no more data will come in.

mscdex commented 2 years ago

but it would be nice if busboy would catch destruction of the parent stream as it is clear no more data will come in.

There is no real way for busboy to do that. pipe() and/or other stream infrastructure is responsible for the interconnection and proper handling of streams.

If you want to see pipe()'s behavior change, you will need to file an issue on the node.js core issue tracker. Or just use require('stream').pipeline() instead.

UnlimitedBytes commented 2 years ago

but it would be nice if busboy would catch destruction of the parent stream as it is clear no more data will come in.

There is no real way for busboy to do that. pipe() and/or other stream infrastructure is responsible for the interconnection and proper handling of streams.

If you want to see pipe()'s behavior change, you will need to file an issue on the node.js core issue tracker. Or just use require('stream').pipeline() instead.

As using import {pipeline} from 'stream'; insteadof request.pipe() does everything that was requested I'm closing this issue.

Thanks a lot for the help! ❤