nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
26.71k stars 4k forks source link

Chunked upload performance investigation #47682

Open provokateurin opened 1 week ago

provokateurin commented 1 week ago

Motivation

Chunked upload is useful in many cases, but also slower than uploading the entire file directly. Overall it is slower because multiple network requests have to be made. The speed of requests varies during their lifetime, as it is slower at the beginning and then flattens out at the maximum over time. The smaller the requests are and the more we make, the worse performance penalty. Thus to increase chunked upload speed the size of chunks should be increased.

A single upload using cURL is the upper limit of the possible upload speed for any configuration and upload method. A chunked upload with the chunk size equal to or greater than the file size represents the upper limit for chunked uploads as it only uploads a single chunk. While reaching the former would be nice, only the latter is achievable (without general performance improvements in WebDAV regardless of the maximum chunk size) and thus represents the theoretical goal.

Testing methodology

Input

dd if=/dev/random of=1G.bin bs=1G count=1

Scenarios

All tests are running on a local instance using the PHP standalone web server with 10 workers and no extra apps enabled. The machine has a Ryzen 5 5800X (8 threads, 16 cores), 48GB RAM and a NVMe M.2 Samsung 980 SSD 1TB. Hardware should not be a bottleneck on this setup and external networking can not have an effect either.

1. cURL single upload

Take the Real timing value.

time curl -X PUT "http://localhost:8080/remote.php/webdav/1G.bin" -u admin:admin --upload-file 1G.bin

Runs: 5.412s 5.223s 5.100s Average: 5.245s

Note: I once saw an outlier that only took about 4.7s, but this never happened again.

Chunked upload via browser

Open Firefox Devtools and filter network requests by dav/uploads. Upload 1G.bin via web interface. Take Started of first (MKCOL) and the Downloaded of the last (MOVE) request and subtract them (See the Timings tab of each request). This includes some constant overhead for the MKCOL and MOVE requests which is not relevant for comparing chunked upload timing results as they all have the same overhead, but when comparing to the cURL scenario it accurately measures the overall time for the upload process.

According to https://firefox-source-docs.mozilla.org/devtools-user/network_monitor/throttling/index.html "Wi-Fi" throttling means a maximum speed of 15 Mbps. Sadly this is the "fastest" speed one can select for throttling and there is no way to set a custom speed. It should represent a worst-case, while most uploads are probably done with 2-3x that speed in the real world if the Nextcloud instance is not on the same network.

Adjusting the default maximum chunk size can be done in https://github.com/nextcloud/server/blob/796405883d214e6e4f3fa1497c036828efee0d62/apps/files/lib/App.php#L45

2. Chunk size 10MiB (current default), unlimited bandwidth

Chunks: 103 Runs: 47.16s 47.65s 47.33s Average: 47.38s

3. Chunk size 100MiB, unlimited bandwidth

Chunks: 11 Runs: 8.53s 8.64s 8.63s Average: 8.6s

4. Chunk size 1024MiB, unlimited bandwidth

Chunks: 1 Runs: 6.37s 6.34s 6.34s Average: 6.35s

5. Chunk size 10MiB (current default), throttled "Wi-Fi"

Chunks: 103 Runs: 551.40s 551.40s 551.40s Average: 551.40s

6. Chunk size 100MiB, throttled "Wi-Fi"

Chunks: 11 Runs: 552.60s 549.60s 551.40s Average: 551.2s

7. Chunk size 1024MiB, throttled "Wi-Fi"

Chunks: 1 Runs: 568.20s 555.60s 553.11s Average: 558.97s

Conclusions

  1. Upload speed in Nextcloud is very consistent regardless of upload method. Great!

  2. Chunked upload in general takes about 21% longer in scenarios with unlimited bandwidth (scenario 1 and 4). Whether this overhead can be eliminated easily is not clear, but at least there is no hard limitation since both uploads are done through WebDAV and thus use the same underlying stack (also see other interesting findings section below).

  3. In the current default configuration with unlimited bandwidth chunked upload is takes 646% longer than the maximum speed (scenario 2 and 4). By increasing the chunk size by 10x keeping the bandwidth ulimited it only takes 35% longer than the maximum speed (scenario 3 and 4). This is a 5.5x increase in total throughput (scenario 2 and 3).

  4. In bandwidth limited scenarios increasing the chunk size has almost no positive (and no negative effect; scenario 5 and 6). This is expected as the slow speed at the beginning of each chunk is a lot smaller on relation to the overall speed or even exactly the same.

  5. Increasing the chunk size helps uploads on fast connections while it has no downsides on slow connections speed wise. Slow networks can be correlated with unstable networks, so having fewer and larger chunks could result in a higher rate of aborted chunk uploads. This downside should be taken into consideration when choosing a new maximum chunk size.

  6. A new maximum chunk size still needs to be figured out by collecting more data for different chunk sizes. It needs to hit a sweet spot of maximum speed with minimum size to account for the before mentioned drawback on unstable networks (basically the point of diminishing returns). This investigation was only to prove that we can increase the chunked upload speed.

Other interesting findings

While uploading with a single chunk and unlimited bandwidth Firefox displayed that the request needed 637ms to send but had to wait 2.10s after that (reproducible). This might show that we have a bottleneck in processing the uploads on the backend side. Maybe it would be possible to stream the request data directly into the file while should cut down the waiting time a lot. It should be possible to profile these requests and figure out where the time is spent.

For single chunks the MOVE request still takes quite some time. I assume this happens because it concatenates the chunks while there is only one (which is slow because it has to read and write all the data). This case could be detected and the file moved without reading and writing it (which is not possible for all storages AFAICT, i.e. it needs to be on the same physical disk to take advantage of it). This only affects uploads where the file size is less than the maximum chunk size. Due to the current low limit it is not really noticable, but with a higher maximum chunk size this would affect a lot more and bigger uploads and could lead to quite a performance improvement for those.

Upload ETA calculations are all over the place due to the varying speed of uploading chunks over their lifetime. It could be improved by taking the overall time a chunk needs to upload and multiplying it by number of remaining chunks. This should be a lot more reliable as every chunk should have a similar upload characteristics. To smooth out the the value it could take into account the last 3-5 chunks.

joshtrichards commented 1 week ago

Good stuff.

A couple thoughts:

[^limits]: Reasons this happens: Apache LimitRequestBody defaults to 1GB these days (used to be unlimited by default), CloudFlare freebie level (100MB) which leads to lots of false bug reports/poor user experiences, and various other web/proxy body/transaction size limits. [^size]: e.g. See nextcloud/desktop#4278

provokateurin commented 1 week ago

@joshtrichards thanks for your input! Highly appreciated having pointers to related topics :)

provokateurin commented 1 week ago

Hi, @nextcloud/desktop would it be possible to sit down with some of you at the Conference in two weeks and discuss this? I think we can figure out something together, solve the problems and unify the behavior across clients and the web.

provokateurin commented 1 week ago

Also @joshtrichards (and anyone else) feel free to join in case this takes place. You seem to know quite a lot of paint points of this problem which would be really helpful to consider.

camilasan commented 1 week ago

@provokateurin Matthieu and me will be at the conference, could you put something on the calendar for this? will you join us @joshtrichards? :)

provokateurin commented 1 week ago

I'll try to find a time slot (the calendar is really full huh) and get it added to the official agenda so everyone interested can join. To avoid collisions with any other event it probably has to be late afternoon.

provokateurin commented 1 week ago

Ok, the meeting will take place Wednesday 18.9. 16:00 at Motion Lab. The room is still to be determined, but will be available in the public agenda and calendar on time.

joshtrichards commented 1 week ago

I wish I could join, but I'll not in be town for the conference.

provokateurin commented 1 week ago

I just realized I might have made a mistake. Instead of the 10 workers I think only 4 were available. Given that the chunked upload tries to upload 5 concurrent chunks at least one of the might have been stalled the whole time. I will make new measurements to confirm if this is the case and affected the results. Overall it shouldn't changed the picture though, as only up-to 20% of the performance was lost due to this error which is a lot less than the noticed overall performance loss utilizing multiple concurrent chunks.

cfiehe commented 4 days ago

During my analysis of https://github.com/nextcloud/server/issues/47856, I have found some interesting points regarding file upload optimization in Nextcloud:

One problem in case of local storage is that OCA\DAV\Connector\Sabre\Directory does not handle moveInto efficiently. It always falls back to Sabre’s default copy-delete handling instead of making use of a more efficient renaming way when source and target are located on the same storage.

Another problem is in View.php, where Nextcloud tells the storage to use rename instead of moveFromStorage in the only case, when source and target storage are represented by the same object $storage1 === $storage2. This criterion is too strict in my opinion and will never be true in case of groupfolders, where an individual storage representation gets used. The criterion does also not play well with all those storage wrappers and prevents a more efficient way to handle file renamings on the same storage.

provokateurin commented 4 days ago

@cfiehe thanks for your insights! We will consider them in the discussion, although this issue itself is more about the chunked upload performance itself. I only noticed similar issues and noted them down here so they can be further investigated in the future.