QB moves multiple complete torrents at a time, leading to super high IO contention.

CyrusNajmabadi commented 6 years ago

Please provide the following information

qBittorrent version and Operating System

QB: 4.1.2 OS: Win10Pro 1803

What is the problem

I have my qb set to download torrents to a very fast SSD array. When complete, those torrents are then moved to a much slower HD array. However, QB is now moving many completed torrents over at hte same time. This leads to huge contention on the HD array, tanking moving speed. i.e. instead of being able to move at 100+ MB/s, the contention makes the overall transfer speed go at 15-25 MB/s.

QB didn't use to work this afaict. It used to move the completed torrents over one at a time.

Mike-EE commented 4 years ago

Since this is already being discussed here...

I participate in several open-source radio projects, as my programming experience is heavily rooted in embedded software and FPGA programming, and I have to say that the ethos of these programs is "core functions come first". Topics related to I/O and other platform functions get the most attention and tickets related to I/O issues are handled first.

I don't perceive that the core qBt development team thinks the same way and it's incredibly frustrating to experience the debilitating I/O issues in qBt, try to work around them and then read debates about implementing features that have no impact on debilitating I/O issues. I would want, if it were me, the "basics" to work well before building more "stuff" into the project.

I sense in this Git and the forum a growing frustration from dealing with memory consumption, storage handling and similar issues for (in some cases) years and we're seeing that reflected in threads such as this. I know from my own experience that no-one wants their passion project to become another "day job" but I would expect that anyone with a key role in this project would want their project to be broadly useful and resolutely reliable, which sadly isn't the way I would describe qBt today.

I implore the qBt development team, including the library developers, to "go after" the I/O (network, storage, etc.) problems and retire these lingering, multi-year issues before efforting feature growth. I know it's more fun to work on new things, but nobody can enjoy the latest "new thing" if the program self-terminates or overloads the storage subsystem before they can really enjoy that "new thing".

Thanks for reading.

arvidn commented 4 years ago

@Mike-EE I assume by "library developers" you mean me. Specifically regarding moving the storage of a torrent from one disk to another, I've made the case, in this thread, that I don't consider adding any sophistication around queuing/serializing this operation in scope of libtorrent (sort of like how one wouldn't expect copyfile() to have any such sophistication). I welcome arguments against my position (but it should probably be on the libtorrent issue tracker, to not pollute this thread any more). One reason for this is exactly because disk I/O and memory efficiency has much higher priority in my book, along with fundamental networking performance (again, I welcome discussions about this on the libtorrent issue tracker). Memory consumption almost always spill into the behavior of the operating system's disk cache though, which (to some extent) I also consider outside the scope of libtorrent, at least heroics to strong-arm the OS' caching strategies.

arvidn commented 4 years ago

also, for context. These are the libtorrent donations: https://bitref.com/373ZDeQgQSQNuxdinNAPnQ63CRNn4iEXzg :)

CyrusNajmabadi commented 4 years ago

this operation in scope of libtorrent

I would understand this if libtorrent didn't have anything to do with moving files. However, it provides the entrypoint that torrent programs use to actually move the files. As such, it should exhibit good behavior around that.

The same issue existed (and was fixed) wrt to checking files. The same argument could have been made that it could just be up to the torrent program to queue files that need to be checked, and libtorrent simply checked when asked. However, libtorrent dealt with this significant problem. It exposes the functionality to check, and it does so intelligently with IO. If it's also handling moving IO then i think it should do the same.

THe arguments that this isn't a core part of bittorrent ring a little hollow to me. If it's not a party of bittorrent, then just don't expose moving at all as part of libtorrent API. However, if it is useful and vlauable have in your library/api, i think it behooves the library/api to do it well.

sort of like how one wouldn't expect copyfile() to have any such sophistication)

Actually, i would. And OSs do expose this sort of functionality. Precisely because thrashing the IO subsystem is not necessarily a good thing (which i presume is why libtorrent doesn't check multiple files at the same time anymore either). I know i've personally used those sorts of APIs in my heavy IO apps. And, for any heavy IO lib, it's def been important to ensure that the lib itself is conscientious of how it schedules IO.

CyrusNajmabadi commented 4 years ago

also, for context. These are the libtorrent donations:

What sort of donation would you be looking for here to do this work? What's the right amount of money to motivate it? Thanks! :)

arvidn commented 4 years ago

As such, it should exhibit good behavior around that.

"Good behavior" for one user may be bad behavior for another. For instance, someone might expect to be able to move files located on separate drives in parallel. I think it's more appropriate to talk about the behavior as higher or lower level. The current functionality is low level. It allows implementing the higher level on top of it. If libtorrent would only provide the high level API, how would you get access to the low level functionality? The API alone is a non-trivial challenge.

The same issue existed (and was fixed) wrt to checking files.

The original behavior for checking files in libtorrent was to only check one at a time. This was extended to allow checking files in parallel (but the default was still one at a time).

The same argument could have been made that it could just be up to the torrent program to queue files that need to be checked, and libtorrent simply checked when asked. However, libtorrent dealt with this significant problem. It exposes the functionality to check, and it does so intelligently with IO. If it's also handling moving IO then i think it should do the same.

You made this argument earlier in the thread already, I won't repeat my response.

THe arguments that this isn't a core part of bittorrent ring a little hollow to me. If it's not a party of bittorrent, then just don't expose moving at all as part of libtorrent API.

So, you can pretend it isn't there. done.

However, if it is useful and vlauable have in your library/api, i think it behooves the library/api to do it well.

This sounds like the same argument as the "good behavior" one.

Actually, i would.

copyfile() copies the file, it doesn't queue it up and do it later. It's reasonable because it's a low level API.

Precisely because thrashing the IO subsystem is not necessarily a good thing

I don't think anyone is arguing that it is.

If you want to respond, PLEASE file a ticket in the libtorrent repo. It's right here. You can still link to the ticket in this thread.

CyrusNajmabadi commented 4 years ago

"Good behavior" for one user may be bad behavior for another. For instance, someone might expect to be able to move files located on separate drives in parallel.

Someone might expect to be able to check multiple files in parallel. Such expectations are reasonable, and i would have no problem with the software making it possible for people to specify that thsi is what they want.

However, this seems like the abnormal case. i.e. which is more likely across all your users: that they have torrents being moved simultaneously to multiple different drives? Or that they're moving to the same drive?

The software should work well for the more common case, and optionally provide ways to work well for the less common cases.

The original behavior for checking files in libtorrent was to only check one at a time. This was extended to allow checking files in parallel (but the default was still one at a time).

Sounds great. Having the same for moving would be awesome. Looks like there's good precedent for it in the library :)

I won't repeat my response.

As i said, your response rings hollow to me. All the reasons you've given for 'checking' to be done in the manner it is done, also seem to apply just as well to 'moving'.

copyfile() copies the file, it doesn't queue it up and do it later. It's reasonable because it's a low level API.

That's a choice of the API currently. There's no reason that you only have to supply a low-level copy op, just like there would be no need to just have a low-level 'check' API that the app layer would have to coordinate to prevent thrashing.

If you want to respond, PLEASE file a ticket in the libtorrent repo.

Sure.

glassez commented 4 years ago

Well, apparently this is a really serious problem, so I'll try to make my contribution here. I must say again that I have no idea about the money that you are talking about here. qBittorrent is just my hobby, I spend my personal free time (although some money could save me from some other worries, so I could devote the free time to this project). Before I introduce any implementation, I would like to discuss some key aspects of its user interface. By and large, all we need to do is implement "queued" storage moving. I.e.:

When torrent need to be moved (by user or by some trigger, e.g. "moving completed torrent from temp folder") we don't perform "moving" immediately but just mark it as "moving storage" and append it to "moving queue" (it corner case, when there are no items in the queue, it still performs moving immediately).
When torrent has finished its moving we check queue for the next item.

Note: the finished torrent should be considered as really finished only when it's done moving from temp directory.

Are there any other global aspects that I need to take into account?

arvidn commented 4 years ago

just don't make the "special case" of an empty queue too special. You still need to add it to the queue, so that any subsequent move gets queued

glassez commented 4 years ago

just don't make the "special case" of an empty queue too special. You still need to add it to the queue, so that any subsequent move gets queued

Sure. "Task in progress" must remain in the queue until it is completed.

CyrusNajmabadi commented 4 years ago

Are there any other global aspects that I need to take into account?

Probably want to ensure that this works even if qb is closed/reopened. I.e. torrents queued for moving aren't somehow 'lost' if the user does that.

Thanks for looking into this!

CyrusNajmabadi commented 4 years ago

Also, rather than having two queues (one for moving, and one for checking), there should really only be one 'heavy io' queue that is shared by both ops. While a check is going on, moves should not happen (and vice versa).

Basically, both these ops completely saturate the underlying io subsystem. So running and concurrently just kills performance.

Thanks!

arvidn commented 4 years ago

Ideally, moving torrents within the same drive would not be subject to queueing

CyrusNajmabadi commented 4 years ago

Ideally, moving torrents within the same drive would not be subject to queueing

FWIW, this feels like an unnecessary special case. Say you are moving on the same drive. Even if you queue, it won't be a problem as the queue will blitz through things super quickly. even if you had a ton of moves (i.e. thousands), that's only going to be thousands of simple metadata-writes in a row, which should still take a tiny amount of time (perhaps a couple of seconds tops). So trying to optimize for that seems to likely be unnecessary.

Another thing that can make this difficult is that "the same drive" can be non-obvious. For example, in my system i use raided storage where the relationship between folders and drives isn't at all necessarily clear. I woudl hate for the system to think it could do things in parallel, while it as still hitting the same disk, thus negating the entire benefit of all this work :-/

arvidn commented 4 years ago

Say you are moving on the same drive. Even if you queue, it won't be a problem as the queue will blitz through things super quickly.

not if you have to sit and wait at the end of the queue for a bunch of moves that actually move across drives.

Another thing that can make this difficult is that "the same drive" can be non-obvious.

Yes, it's not trivial. one sure way to discover this is to attempt a rename(), and if it fails, you know you have to perform a copy.

arvidn commented 4 years ago

perhaps move_flags_t could be extended with an option to "no_fallback_to_copy", to have a cheap way to try the cheap operation first.

CyrusNajmabadi commented 4 years ago

perhaps move_flags_t could be extended with an option to "no_fallback_to_copy", to have a cheap way to try the cheap operation first.

That seems very reasonable.

glassez commented 4 years ago

Wait, wait... It looks like you want to get something "excellent" right away, without having anything yet. This is a way to nowhere (given our current capabilities). As they say, the best is the enemy of the good. In addition, it seems that the "moving queue" will cover most of the problematic cases. At the very least, further improvement looks significantly less profitable in terms of cost/benefit ratio (at least having current libtorrent implementation). Anyway, we can add improvements incrementally. So if the "moving queue" doesn't suit you, I won't even start doing anything.

arvidn commented 4 years ago

@glassez I'm not sure who you're referring to as "you" there. I have no skin in this game so feel free to ignore me.

glassez commented 4 years ago

@arvidn, is it hard to implement an option to set libtorrent to use only one thread for "move storage" jobs?

arvidn commented 4 years ago

well. you could set the number of disk I/O threads to 1 as long as there are any pending move jobs. that would affect all other disk I/O though, and prevent peers to request data while waiting for the move to complete. not ideal.

I can't think of a particularly simple way to do this. There are two queues right now, one for hash jobs and one for all other disk I/O jobs. iirc, this is to prevent hash jobs to starve all other jobs. maybe something similar could be done for move jobs.

iheartcsharp commented 4 years ago

I wish I could help out but I'm heavy on the C# side and learning the dev tools and C++ will take a long time. I think the best option is to just have qBittorrent have a global queue variable and move the files one by one as a stop-gap measure until there is time to add a smarter logic. At least if the files move 1 by 1 it will be a much better than the current behavior which causes thrashing of the drives and fragmenting of the files.

glassez commented 4 years ago

Working on it currently...

qbittorrent / qBittorrent

QB moves multiple complete torrents at a time, leading to super high IO contention. #9407

qBittorrent version and Operating System

What is the problem