Closed fenos closed 9 months ago
@Murderlon I've updated the Readme and added a detailed documentation on the maxConcurrentPartUploads
flag.
I've also increased the value to 60 instead of 30 (~480MB), as this aligns better with tusd default.
Tusd uses 50MB partSize
and 10 maxConcurrentPartUploads
totaling at 500MB.
Let me know if there is anything else, this change will be awesome to have in
Regarding replacing internal code with dependencies, I would wait for a comment from @Murderlon. Maybe his position is different than mine :)
Regarding replacing internal code with dependencies, I would wait for a comment from @Murderlon. Maybe his position is different than mine :)
I'd personally would avoid adding a dependency if it's not much code, we likely rarely/never have to change it, and we don't foresee complex side-effects. So I don't think we need a package to concat streams or unique tmp files. For the semaphore I don't have a strong opinion, in this case it's a well tested copy paste from Shopify so from a trust perspective I'd say it's equal (or higher) to a package. But if there is a similar well tested package with a small footprint I would be okay with that too. I'm okay with either.
Alright, if you prefer less dependencies, we can go with that route as well. That's probably a topic where personal preferences differ.
I just saw that @shopify/semaphore
is available as a package. In that case I agree with @Acconut that a dependency is better.
New dependencies detected. Learn more about Socket for GitHub ↗︎
Package | New capabilities | Transitives | Size | Publisher |
---|---|---|---|---|
npm/@shopify/semaphore@3.0.2 | None | 0 |
8.77 kB | shopify-dep |
npm/@types/multistream@4.1.3 | None | +1 |
764 kB | types |
npm/multistream@4.1.0 | Transitive: environment | +7 |
196 kB | feross |
Hello @Murderlon @Acconut
Sorry for the delay on this, has been very busy last week. I've made all the changes we discussed on this PR here is the breakdown of my latest commit:
@shopify/semaphore
as a dependency.multistream
dependency to handle stream concatenation. The reasoning behind this is that my implementation might have missed some edge cases that I'm not aware of. This library has been battle-tested for years and it seems good to rely on this for concatenating streams.Let me know if there is anything else
@Murderlon all ready
This PR improves the handling of incomplete parts and adds backpressure by limiting the number of concurrent uploads/files on the disk that are being created.
Incomplete Parts
The incomplete part logic was somewhat not always working as intended.
An issue I came across with the old implementation is that when i upload a file using chunked upload from the client, and the chunk is bigger than the
preferred
size on the server, at times the offset returned wasn't calculated properly resulting in the subsequent request to return 409 (Offset-Conflict) very much related to #501This PR fixes this issue, by always downloading the incomplete part first and prepending it to the readstream. By using this approach the offset is now calculated properly and also fixes #501 (see test)
Backpressure
Added support for backpressure. The way it works is by using a semaphore on the StreamSplitter. Only X number of chunks on disk can be created at any one time (default 30) and the semaphore is released upon finishing the part upload, meaning that the backpressure is handled by how fast S3 can upload parts.
In the scenario when the limit of concurrent chunks has been reached, the default node mechanism of backpressure will kick in. This means that when the highwater mark on the readstream buffer is full, the stream will be paused until the flow of data continues and the semaphore has capacity.
This aligns with the issue #505