How to properly validate the file size of the upload?

fabis94 commented 4 years ago

I've spent about an hour trying to find some resources about this, but couldn't find anything that actually explained how to validate the file size before uploading it to S3 or wherever. All of the file upload guides/tutorials conveniently skip the issue of file size validation.

So, how would I properly validate the size to make sure an absurdly big file isn't being POSTed? Sure, I know that I can limit the max file size for this entire lib, but that isn't granular enough. Some mutations might need to limit the max size at 1MB, and others might need to limit it at 100MB.

My only idea right now is to take the stream returned by createReadStream and try to read it to see how big it is. And then if its ok, create a new read stream for passing to S3 using createReadStream again.

Another relevant question: If I find the file size to be too big, how can I tell this package to clean up the huge file that's been written to the temp directory?

Thanks in advance

fabis94 commented 4 years ago

Ended up going with this:

  // Validate file size
  const uploadStream = upload.createReadStream();
  let byteLength = 0;
  for await (const uploadChunk of uploadStream) {
    byteLength += (uploadChunk as Buffer).byteLength;

    if (byteLength > MaxAllowedFileSize) {
      break;
    }
  }
  uploadStream.destroy();

It's only counting bytes so it doesn't seem like it should cause any performance issues, even with larger files

jaydenseric commented 4 years ago

I'm sorry, but I just don't have the availability to help answer this usage question. If I had some code examples handy I would share them, but I don't - in my APIs a global limit has been sufficient.

Feel free to keep the discussion going if you find a good solution, and anyone else chime in with tips.

I suggest trying to measure the stream as it is being used, as opposed to trying to buffer or store the entire upload locally to verify total size before streaming it out to the final destination. Pretty sure there are packages available to help with that, although it might be simple enough to do (depending on your Node.js version?) without a dependency.

fabis94 commented 4 years ago

I'm passing the readstream directly to the AWS S3 client, so I can't measure it as it is being used and the S3 client doesn't offer any kind of file size validation from the client

jaydenseric commented 4 years ago

You can abort the stream when it approaches or goes over the limit, and either let the AWS SDK handle the error/cleanup or do some special handling. You need elegant error handling anyway, because a user's network could always drop out partway though the upload.

fabis94 commented 4 years ago

You can abort the stream when it approaches or goes over the limit

Isn't that what I'm doing in my example? Or do you mean that I could somehow hook into the actual ReadStream object passed to AWS? Can a single ReadStream be used from 2 different locations at once?

I'm also not quite sure what you mean with elegant error handling, there's not much more I can do than wrap the S3 upload process in a try/catch and convert the error to something more human readable. If the network drops out halfway, then the file can't be uploaded and there's not much else that can be done.

fabis94 commented 4 years ago

Improved my example so that the stream is uploaded & validated at the same time.

  // Create pass-through streams for validating file size while uploading to S3
  const validationStream = new PassThrough();
  const uploadStream = new PassThrough();

  let byteLength = 0;
  stream.pipe(validationStream).on("data", (data: Buffer) => {
    byteLength += data.byteLength;

    // Once file size gets too big, kill all streams and halt the upload
    if (byteLength > MaxImageFileSize) {
      uploadStream.destroy(
        new FileManagerError( `Upload exceeds the maximum allowed size!!`)
      );
    }
  });
  stream.pipe(uploadStream);

  try {
    const result = await s3client.upload({Body: uploadStream, Bucket: X, Key: Y}).promise();
  } finally {
    uploadStream.destroy();
    validationStream.destroy();
    stream.destroy();
  }

Not sure if I need to destroy all streams at the end, but did it anyway to be safe

Edit: There's an even better suggestion here https://github.com/mike-marcacci/fs-capacitor/issues/27#issuecomment-631570106. Instead of two pass through streams, you can create a special SizeValidatorStream and just pipe the original stream through it, and then pass the validatorStream into the S3 client.

albertly commented 3 years ago

@fabis94 or somebody else. Why do you need uploadStream ?

Can you use const result = await s3client.upload({Body: validationStream , Bucket: X, Key: Y}).promise();

jaydenseric / graphql-upload

How to properly validate the file size of the upload? #204