iterate-ch / cyberduck

Cyberduck is a libre FTP, SFTP, WebDAV, Amazon S3, Backblaze B2, Microsoft Azure & OneDrive and OpenStack Swift file transfer client for Mac and Windows.
https://cyberduck.io/
GNU General Public License v3.0
3.3k stars 291 forks source link

Segmented download I/O #16433

Open vt-idiot opened 5 days ago

vt-idiot commented 5 days ago

Describe the bug

Cyberduck currently attempts to assemble all .cyberducksegment files for multiple downloads concurrently. This basically results in excessive disk thrashing (and doubled writes) as it is also currently hardcoded to store the segments in a folder adjacent to the requested download location #11841

This is made worse by the fact that no status updates are shown within Cyberduck itself other than "Disconnecting" #13610

To Reproduce

  1. Attempt to download several folders with sufficiently large files within them, see below
  2. Watch as it gradually becomes an order of magnitude slower than attempting a non-segmented download

Attempted Transfers (Actual Example)

Transfer1      
    \Folder1
        \File1-1.mp4        5.48 GiB
        \File1-2.mp4        4.42 GiB
Transfer2
    \Folder2
        \File2-1.mp4        6.61 GiB
        \File2-2.mp4        3.04 GiB
Transfer3
    \Folder3
        \File3-1.mp4        2.76 GiB
        \File3-2.mp4        2.83 GiB
        \File3-3.mp4        1.90 GiB
    \Folder4
        \File4-1.mp4        6.79 GiB
        \File4-2.mp4        5.55 GiB
    \Folder5
        \File5-1.mp4        3.90 GiB
        \File5-2.mp4        5.25 GiB

Transfer3 on its own would've been more than sufficient to result in thrashing. Any scenario involving segmented downloads does, but a single file being concatenated might go unnoticed provided it isn't larger than a few gigabytes in size.

There was seemingly no rhyme or reason to which order the files were "done" done in, File3-3.mp4 was done and re-assembled before Transfer1 & Transfer2 were even finished downloading.

Add'l Notes on Reproduction (Server, Settings, etc.)


Expected behavior

I am not entirely sure what the ideal expected behavior is. There should be some kind of queueing a bit more complex than "first-come, first-served, everybody at the same time" as a bare minimum fix; Cyberduck is currently attempting to concatenate multiple files simultaneously.

This makes it take an order of magnitude longer on any single mechanical hard drive, likely on any SSD "worse" than an MLC SSD with sufficient DRAM cache, and probably on most RAID arrays with parity. Obviously something like SSDs in RAID, or even hard disks in RAID10 will probably be fine, but you shouldn't need storage tuned for maximum SQL database to handle a segmented download. I'm not sure how a CoW filesystem would fare as I do not currently have a ReFS partition or ZFS to test with (where I have permission to install Cyberduck and download large files willy nilly) nor do I have the slightest idea how macOS handles low-level file operations.

A DRAM-less QLC SSD will fare just as bad as my hard drive did before the concatenating even starts, I've tested that.

I can really only see the current implementation being faster on something like an NVMe drive, or on something obscene like an Optane drive (RIP) or a RAM disk, and only when the underlying file system isn't garbage. Something Really Crappy™ like an exFAT formatted 2.5" SMR hard disk would probably just keel over.


How necessary are the segments in the first place?

In aria2c for example, I can set falloc as the file allocation method, which appears to work and instantly creates a "sparse file" on NTFS drives in Windows, and then download a 30 GiB file with 5 simultaneous connections and not have to worry about aria2c "putting it back together" later. The same applies for wget, which I believe opens multiple connections by default. Standalone torrent clients can similarly download files in "pieces" using sparse files (on NTFS, and their equivalents elsewhere?).

You can make either aria2c or e.g. qBitTorrent take significantly longer if you enable or force "pre-allocation" and have it write zeros out for the entire file, but at least that is completed at or close to the drive's sequential write speed, "bad" SSDs or SMR hard drives or something notwithstanding.

Screenshots

image don't mind the censor blocks, nobody needs to know about these ISO(BMFF) files

Note the Date Created/Modified and remaining sizes in the segment folders. There doesn't really appear to be any method to the madness, and some of the later started downloads were seemingly done concatenating their files before the very first file downloaded. As one would expect when throwing what essentially becomes random I/O at a hard drive.


image performance improving as the queue shrank from 3 files remaining to 1 file remaining, I regret not getting a screenshot when it was trying to put six back together at the same time

If, for example, only a single file could be concatenated at a time, performance would still improve over the current setup, even if other downloads were still transferring. The current implementation seems to be the slowest way multiple segmented downloads can possibly be handled.


CrystalDiskMark_20241016142450 default CrystalDiskMark results for stinky old hard drive

CrystalDiskMark_20241016143711 ignore the sequential read speed, but that random I/O performance seems to be close to how Cyberduck behaves at the moment

It's an older 512-byte drive with the default 4KiB NTFS allocation unit size. I'm not sure how Cyberduck handles I/O directly, but a simple fix might also be making larger reads/writes when concatenating? Giving the user the option to set a (RAM) cache to handle reads of .cyberducksegments? Or increasing the size if it currently exists. I've never handled disk I/O like this before at any kind of low-level.

Specs

Log Files

Oops. Sorry, but I am not repeating a download that was done transferring nearly an hour before the files were actually "done," and I have already disabled segmented downloads for the time being.

Additional context

If one downloads a single "mediumly" large file, e.g. 1-2 GiB, this might be hardly noticeable as a hard drive would only spend a minute or so on concatenating the file, and even the crappiest of cheap SSDs doesn't start to choke until a bit later than 1-2 GiB.

A 10 GiB folder with five 2 GiB files may not be immediately noticeable either, since it seems that Cyberduck begins concatenating each file as soon as it is done downloading, and depending on connection limits in Cyberduck and on the FTP server itself, it might only be downloading 1-2 files at a time. I'm not sure if it prioritizes remaining downloads over concatenation jobs or vice versa, if at all.

This might be even less noticeable if the last file(s) in the download order are smaller, there's less "catching up" I/O to do.

\Folder
       \00-Intro.mp4       0.5 GiB
       \01-Part1.mp4       2.5 GiB
       \02-Part2.mp4       2.5 GiB
       \03-Credits.mp4     0.2 GiB
       \04-Extras.mp4      0.5 GiB

TL;DR

The problems begin when a single large (>5 GiB) file has to be concatenated, and it gets exponentially worse as the number of (simultaneously) downloaded files and/or the number of "Transfer" jobs increases.

vt-idiot commented 5 days ago

I don't think this qualifies as a duplicate of #10961 - I'm sure the improvements made by #13000 were significant, but there are still performance issues.

I do also understand that segmented downloads are probably best suited to scenarios where individual connections might be limited in speed (S3, SFTP?) and are (at least currently) really only usable on SSDs. I am aware the segmented download itself only resulted in marginally faster transfer speeds when using FTPs.

AliveDevil commented 4 days ago

Throwing this in here, regarding the sparse file usage: The segment files currently allows for validation of whether that segment has already and completely finished downloading - regardless of whether the backend storage allows segmented downloads.

As transfers in Cyberduck don't know anything about progress when at-rest, there is no recoverability, and complete restarts are the only other option.

I don't know whether FileChannels can pre-allocate their storage.