nathanpeck / s3-upload-stream

A Node.js module for streaming data to Amazon S3 via the multipart upload API
MIT License
347 stars 46 forks source link

Stream speed issues #41

Open voronianski opened 9 years ago

voronianski commented 9 years ago

I'm using s3-upload-stream together with Busboy in order to stream files directly to s3 storage without saving them into temporary directory on server.

The problem is in upload speed, for example it takes 1943ms to upload image with 26kb size, which looks like really slow. Here is the speed test of my network:

The code looks similar to:

const busboy = new Busboy({ headers: req.headers });
busboy.on('file', (fieldname, file, filename, encoding, mimetype) => {
  // NOTE: file is ReadableStream

  const awsFileName = uuid.v4();

  // create writableStream
  const upload = s3UploadStream.upload({
    bucket: config.s3.bucket,  
    Key: awsFileName,
    ContentType: mimetype
  });

  file.on('uploaded', () => ...);
  file.on('error', () => ...);
  file.pipe(upload);
});

busboy.on('error', () => ...);

req.pipe(busboy);

Does anybody encountered into such issue?

nathanpeck commented 9 years ago

This module is designed for much larger files, and uses an approach that for your use case will require three separate requests to upload your 26kb file:

1) establishing the multi part upload with S3 2) uploading a single 26kb part 3) asking S3 to assemble the file from that single part.

This is where the extra time is coming from. Basically if your file is only 26kb then you should be using a traditional upload stream (just open a normal single POST request and pipe your 26kb file stream into it) not using this module.

This module is designed for workflows that require uploading massive, multi GB/TB streams that may not even fit fully into memory. Under those conditions you need this approach to upload the file over many separate requests, as concurrent 5 mb parts.

If you have files of a variety of different sizes and need to upload both huge files and small ones I'd recommend developing an adaptive system that can use this module for uploading stuff that is greater than say 50mb, while uploading things less than 50mb using a single POST request to S3. That way you can have the speedy uploads for the small files, and the efficient, low memory, MPU approach for the large files.

voronianski commented 9 years ago

@nathanpeck thanks!

According to:

Basically if your file is only 26kb then you should be using a traditional upload stream (just open a normal single POST request and pipe your 26kb file stream into it) not using this module.

How do you recommend to stream small files without saving them temporary? We tried knox which has putStream method but there's no way to get file's mimetype before streaming without saving it.. or probably I'm missing something?

nathanpeck commented 9 years ago

It is possible to determine mimetype prior to saving. Most mimetype solutions like this one just do a simple lookup based on the file extension, and you should be able to know the filename and extension prior to uploading.

If you need a deeper inspection of the file (perhaps because you don't trust people to put proper extensions on their stuff) then something like exiftool can be used to inspect the contents of the stream (ignoring the extension) and look for the magic bytes that show the file is an EXE, or PNG, or JPG, or MP4, etc.

voronianski commented 9 years ago

@nathanpeck thanks for info!

voronianski commented 9 years ago

@nathanpeck we made a quick speed test between your s3-upload-stream and plain official Amazon SDK lib.

With s3-upload-stream:
let s3Stream = s3UploadStream(new AWS.S3());
let busboy = new Busboy({ headers: req.headers });

busboy.on('file', (fieldname, file, filename, encoding, mimetype) => {
    let upload = s3Stream.upload({
      Bucket: s3.bucket,
      Key: filePath,
      ContentType: mimetype
    });
    upload.on('uploaded', details => resolve(details));
    upload.on('error', error => reject(error));

    file.pipe(upload);
});
busboy.on('error', error => reject(error));

req.pipe(busboy);
and with aws-sdk-js:
let busboy = new Busboy({ headers: req.headers });
busboy.on('file', (fieldname, file) => {
    let params = {
        Bucket: s3.bucket,
        Key: filePath
    };
    new AWS.S3({ params }).upload({Body: file}).send((err, data) => {
        err ? reject(err) : resolve(data);
    });
});
busboy.on('error', error => reject(error));

req.pipe(busboy);

The uploaded file size was 171Mb and results are:

So it took > 2x time to upload similar file with this module..

nathanpeck commented 9 years ago

Interesting. Well I may take a look to see if there are any optimizations to be made, but overall I'd recommend using S3.upload() then.

I made this module nearly two years ago before that method was added to the Node.js SDK, and the only way to upload files to S3 at that time was using putObject() with a buffer. You are probably better off using the official one from AWS now.

I'm actually going to update the readme to direct people toward that method in fact.