muxinc / upchunk

Uploads Chunks! Takes big files, splits them up, then uploads each one with care (and PUT requests).
MIT License
340 stars 46 forks source link

feat: Refactor upchunk to use readable streams for memory usage impro… #95

Closed cjpillsbury closed 1 year ago

cjpillsbury commented 1 year ago

…vement.

Overview

This PR is a rearchitecture of Upchunk to use ReadableStreams as the basis for reading bytes from a file. Unlike the current implementation, which relies on loading the entire file into the JavaScript runtime heap, this new architecture allows us to reduce the memory footprint to a given read() from the file's ReadableStream (plus any remaining bytes from the previous read that have yet to be uploaded).

Why a separate AsyncIterable class (ChunkedStreamIterable)?

While the code could have been entirely written inline as part of the UpChunk instance, creating a class that conforms to ECMA standards for async iteration allows us to:

Why a "pull"-based use of streams (instead of a "push")

Using things like pipes with backpressure can often be a very clean way to chain together discrete transformations and side effects when working with streams. Unfortunately, the current APIs don't have sufficient ways of cleanly handling:

Additional notes

Given the scope of changes here, additional tests have been added to validate that the "uploaded" files are identical (in bytes) to the files provided to UpChunk. Also, even though the API has only been changed in an additive way (e.g. adding off() and once() methods) and should be fully backwards compatible, this will likely be released as a major version change, due to the scope of the refactor.

resolves: #89