tus / tus-node-server

Node.js tus server, standalone or integrable in any framework, with disk, S3, Azure, and GGC stores.
https://tus.io/
MIT License
824 stars 201 forks source link

@s3/store: add backpressure + download incomplete part first #561

Closed fenos closed 9 months ago

fenos commented 9 months ago

This PR improves the handling of incomplete parts and adds backpressure by limiting the number of concurrent uploads/files on the disk that are being created.

Incomplete Parts

The incomplete part logic was somewhat not always working as intended.

An issue I came across with the old implementation is that when i upload a file using chunked upload from the client, and the chunk is bigger than the preferred size on the server, at times the offset returned wasn't calculated properly resulting in the subsequent request to return 409 (Offset-Conflict) very much related to #501

This PR fixes this issue, by always downloading the incomplete part first and prepending it to the readstream. By using this approach the offset is now calculated properly and also fixes #501 (see test)

Backpressure

Added support for backpressure. The way it works is by using a semaphore on the StreamSplitter. Only X number of chunks on disk can be created at any one time (default 30) and the semaphore is released upon finishing the part upload, meaning that the backpressure is handled by how fast S3 can upload parts.

In the scenario when the limit of concurrent chunks has been reached, the default node mechanism of backpressure will kick in. This means that when the highwater mark on the readstream buffer is full, the stream will be paused until the flow of data continues and the semaphore has capacity.

This aligns with the issue #505

fenos commented 9 months ago

@Murderlon I've updated the Readme and added a detailed documentation on the maxConcurrentPartUploads flag.

I've also increased the value to 60 instead of 30 (~480MB), as this aligns better with tusd default. Tusd uses 50MB partSize and 10 maxConcurrentPartUploads totaling at 500MB.

Let me know if there is anything else, this change will be awesome to have in

Acconut commented 9 months ago

Regarding replacing internal code with dependencies, I would wait for a comment from @Murderlon. Maybe his position is different than mine :)

Murderlon commented 9 months ago

Regarding replacing internal code with dependencies, I would wait for a comment from @Murderlon. Maybe his position is different than mine :)

I'd personally would avoid adding a dependency if it's not much code, we likely rarely/never have to change it, and we don't foresee complex side-effects. So I don't think we need a package to concat streams or unique tmp files. For the semaphore I don't have a strong opinion, in this case it's a well tested copy paste from Shopify so from a trust perspective I'd say it's equal (or higher) to a package. But if there is a similar well tested package with a small footprint I would be okay with that too. I'm okay with either.

Acconut commented 9 months ago

Alright, if you prefer less dependencies, we can go with that route as well. That's probably a topic where personal preferences differ.

Murderlon commented 9 months ago

I just saw that @shopify/semaphore is available as a package. In that case I agree with @Acconut that a dependency is better.

socket-security[bot] commented 9 months ago

New dependencies detected. Learn more about Socket for GitHub ↗︎

Package New capabilities Transitives Size Publisher
npm/@shopify/semaphore@3.0.2 None 0 8.77 kB shopify-dep
npm/@types/multistream@4.1.3 None +1 764 kB types
npm/multistream@4.1.0 Transitive: environment +7 196 kB feross

View full report↗︎

fenos commented 9 months ago

Hello @Murderlon @Acconut

Sorry for the delay on this, has been very busy last week. I've made all the changes we discussed on this PR here is the breakdown of my latest commit:

Let me know if there is anything else

fenos commented 9 months ago

@Murderlon all ready