nathanpeck / s3-upload-stream

A Node.js module for streaming data to Amazon S3 via the multipart upload API
MIT License
347 stars 46 forks source link

Adds support for pausing and resuming uploads across one or more sessions #25

Closed konklone closed 10 years ago

konklone commented 10 years ago

resume-and-pause

This PR adds the ability for the S3 upload stream to pause and resume. The same stream instance can be paused and then resumed, or a new stream instance can resume a prior session's multipart upload (given an upload ID and part data).

These features are designed to integrate together nicely, where a pause() call gives the integrator all the data they need if they wish to freeze the session for resumption in a later session.

Pausing and resuming a stream instance

https://github.com/konklone/s3-upload-stream/commit/2ce5c70acfc1e5be0462b7b1a6b8d1104da8bd9a adds pause() and resume() methods to the S3 upload stream.

The idea is that you call pause(), which emits a pausing event while it waits for any parts that are mid-upload by the browser to complete, then calls paused when it's done.

Calling resume() will resume reading from the input stream, and the uploading of any queued part data, and echo an external resume event. It's safe to call resume() any time after pause() -- if it's called between pausing and paused, then the stream will just resume and paused will never fire.

An internal pause variable maintains state. pause() on a dead or paused stream does nothing (returns false). resume() on an unpaused stream does nothing (returns false). These functions otherwise return true.

When the 'paused' event fires, it will emit an object with the current UploadId and Parts array, so that the caller can potentially store this information for resumption in a later session.

Resumption in a later session

https://github.com/konklone/s3-upload-stream/commit/6e13e79bdf6842df62f666c1895e452ca57ec3dd and https://github.com/konklone/s3-upload-stream/commit/0d1b9b131d0dfa1cadf28aef95a558cfeefb8771 add support for resuming a multipart upload with a new S3 upload stream instance.

The ready event now emits the current multipart upload ID, whether the stream is resuming a multipart upload or creating a new one.

The stream constructor now accepts an optional second parameter, sessionDetails, which is an object that requires an UploadId and a Parts array, identical to what S3 requires when wrapping up a multipart upload. It is also identical to what pause() will eventually deliver to the user via the paused event.

An internal started variable maintains state. When the stream gets its first part's worth of data and would normally create a multipart upload, it now first checks to see if the multipartUploadId is already set, and if the stream is started. If so, it skips the create step for the multipart upload and just starts uploading parts.

This makes an assumption that the Parts array will contain a set of contiguous parts. They don't have to be in order, but they do have to start from 1, and not have any gaps. My understanding of S3's API and this library is that this will always be the case, even when processing/pausing/resuming multiple concurrent uploads.

Tests are included for resuming a multipart upload from session details. It's based on the tests for creating a new stream, but the test expects to see the upload ID that was passed in (not the one baked into the S3 mock), and expects to see a part number that is 1 greater than the number of parts passed in.

Notes

I included tests for resumption in a later session, but not for pause() and resume(). I wasn't able to think of an easy way to mock out the pause/resume flow in the test suite as structured. If this is important, advice would be helpful on how to proceed. (It is working very well in empirical testing, but with the additional pause/pausing/resume surface area it's certainly possible I've missed something.)

I've added documentation to the README for pause(), resume(), and how to resume an upload stream.

This PR contains the changes to 1.0.5, in https://github.com/nathanpeck/s3-upload-stream/commit/932da15564bb61a18747a1e1869041401ef835ec. When 1.0.5 is merged to master separately, this PR should automatically shrink to only include the remaining commits I've added.

konklone commented 10 years ago

I updated the PR to also emit the total number of bytes sent so far, as part of the paused event data. This is necessary data to efficiently start resuming a file upload in another session.

nathanpeck commented 10 years ago

Great work! I merged it in and published as 1.0.6

Thanks!

konklone commented 10 years ago

My pleasure, and thanks for merging this so quickly!

1N50MN14 commented 10 years ago

wohow! this is awesome @konklone !! Thanks!