This PR is a rearchitecture of Upchunk to use ReadableStreams as the basis for reading bytes from a file. Unlike the current implementation, which relies on loading the entire file into the JavaScript runtime heap, this new architecture allows us to reduce the memory footprint to a given read() from the file's ReadableStream (plus any remaining bytes from the previous read that have yet to be uploaded).
Why a separate AsyncIterable class (ChunkedStreamIterable)?
While the code could have been entirely written inline as part of the UpChunk instance, creating a class that conforms to ECMA standards for async iteration allows us to:
Take advantage of constructs like iterators, making the asynchronous, serial process of chunked upload much simpler and easier to understand in its use in UpChunk::sendChunks().
Have clear separation of concerns, leaving room for more granular/unit-level testing and isolating some of the more complicated parts when refactoring/reasoning about functionality, since ChunkedStreamIterable shouldn't need to be refactored as frequently.
Have a much easier path forward for plausible future features, like direct MediaRecorder support.
Have a much easier path forward for alternative application of chunked uploads (e.g. WebSockets or WebRTC data channels, transforms, etc.)
Why a "pull"-based use of streams (instead of a "push")
Using things like pipes with backpressure can often be a very clean way to chain together discrete transformations and side effects when working with streams. Unfortunately, the current APIs don't have sufficient ways of cleanly handling:
pauses (in a way that wouldn't result in aggregation of file bytes into memory)
dynamic queuing strategies to apply appropriate backpressure (e.g. with dynamicChunkSize enabled)
Additional notes
Given the scope of changes here, additional tests have been added to validate that the "uploaded" files are identical (in bytes) to the files provided to UpChunk. Also, even though the API has only been changed in an additive way (e.g. adding off() and once() methods) and should be fully backwards compatible, this will likely be released as a major version change, due to the scope of the refactor.
…vement.
Overview
This PR is a rearchitecture of
Upchunk
to useReadableStreams
as the basis for reading bytes from a file. Unlike the current implementation, which relies on loading the entire file into the JavaScript runtime heap, this new architecture allows us to reduce the memory footprint to a givenread()
from the file'sReadableStream
(plus any remaining bytes from the previous read that have yet to be uploaded).Why a separate
AsyncIterable
class (ChunkedStreamIterable
)?While the code could have been entirely written inline as part of the
UpChunk
instance, creating a class that conforms to ECMA standards for async iteration allows us to:UpChunk::sendChunks()
.ChunkedStreamIterable
shouldn't need to be refactored as frequently.MediaRecorder
support.WebSockets
orWebRTC
data channels, transforms, etc.)Why a "pull"-based use of streams (instead of a "push")
Using things like
pipes
withbackpressure
can often be a very clean way to chain together discrete transformations and side effects when working with streams. Unfortunately, the current APIs don't have sufficient ways of cleanly handling:dynamicChunkSize
enabled)Additional notes
Given the scope of changes here, additional tests have been added to validate that the "uploaded" files are identical (in bytes) to the files provided to
UpChunk
. Also, even though the API has only been changed in an additive way (e.g. addingoff()
andonce()
methods) and should be fully backwards compatible, this will likely be released as a major version change, due to the scope of the refactor.resolves: #89