piece_extent_affinity benchmark thread

FranciscoPombal commented 4 years ago

https://github.com/qbittorrent/qBittorrent/pull/11781 was merged, so as promised in https://github.com/qbittorrent/qBittorrent/issues/11436, here is the thread for testing the performance improvement of piece_extent_affinity.

Versions of qBittorrent with this option exposed in the advanced settings:

master branch commit 146e8213a5d0d5d200e5ac7a69c0459847ea6e98 or later
qBittorrent 4.2.2 (as of yet unreleased).

Libtorrent version required: 1.2.2 or later (if older versions are used, the option may still be available qBittorrent's WebUI, but will have no effect).

@xavier2k6 @fusk-l Feel free to post methodology, tests and results here.

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/86941991-piece_extent_affinity-benchmark-thread?utm_campaign=plugin&utm_content=tracker%2F298524&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F298524&utm_medium=issues&utm_source=github).

xavier2k6 commented 4 years ago

@Seeker2 @arvidn FYI

@FranciscoPombal @thalieht Could this be labeled with discussion & maybe core/meta/libtorrent/performance or whatever ye may deem necessary yourselves.

apexlir commented 4 years ago

Just did a quick test with qbitorrent 4.2.5 and libtorrent 1.2.6 and enabling this option did reduce the IOWait but it seems bandwidth was also reduced by around 200Mbps and load increased a bit :

Disabled bench-disabled

Enabled bench-enabled

Hardware: Intel 4 core (no HT) 16GB RAM, Samsung 970 plus nvme (xfs)

xavier2k6 commented 4 years ago

@apexlir what program is that? Is it prometheus?

apexlir commented 4 years ago

Telegraf / Influx / Grafana with a 5s interval

arvidn commented 4 years ago

interesting. In a way, you would expect that lower I/O-wait would cause a higher CPU usage (because it has to wait less), but the drop in bandwidth would be a bit of a mystery then. This option will probably make piece-picking a little bit more expensive, since it adds this additional constraint/bias, so that could possibly explain some of it.

I wouldn't expect that an NVMe drive to see the greatest benefit from this option. I suspect that all of these writes just go straight into the page cache anyway, but it may contributing to fewer page faults as writes are concentrated to fewer pages, and once a page is flushed it's unlikely to have to be faulted-in again, I suppose.

FranciscoPombal commented 4 years ago

@apexlir Thanks for the benchmarks.

The difference in the scale and labeling of the graphs is a bit unfortunate, since it makes it quite hard to compare them, especially when the differences are so small. That being said, the drop in CPU I/O wait is quite noticeable, and the drop in bandwidth is small (in relative terms, about 3-4%), but noticeable as well. The CPU usage looks about the same to me, but again it's hard to compare. Logically it should be higher, because the piece picker logic is more complex when piece extent affinity is ON. I wonder if the CPU is actually becoming the bottleneck here.

Of course, I'll echo that one would expect hard drives to benefit the most from this setting. Furthermore, I would expect the performance uplift to be more significant if torrents are not downloading in sequential mode (I assume libttorrent's auto sequential mode kicked in for this benchmark run). It would be interesting to see both cases benched with hard drives, if someone is up for that.

xavier2k6 commented 4 years ago

@FranciscoPombal Perhaps it would be a good idea to add a table like below to the 1st post that can be editable (information/benchmarks will all be in the one place) with set information that @arvidn requires to aid/determine what is working right/what needs to be tweaked & it will also act as a guideline for those wishing to benchmark.

Perhaps, a census could be agreed - on what tool could be used best on Windows/Linux/macOS so that in a way everyone will be singing off the same hymn sheet!

Thoughts?

P.E.A OFF/ON	Torrent Size	HDD Or SSD	Piece Size	I/O	Bandwidth
ON	208GiB	HDD	256KiB	ABC123	ABC123
ON	340GiB	SSD	4MiB	ABC123	ABC123
OFF	320GiB	SSHD	8MiB	ABC123	ABC123
OFF	139GiB	SSD	16MiB	ABC123	ABC123

FranciscoPombal commented 4 years ago

@xavier2k6 Unfortunately, to produce meaningful data for analyzing the more minute differences, I don't think such a table is sufficient. We need at least as much data/different metrics as @apexlir provided, with well-controlled environments and settings, and in a usable format for processing like csv. The torrent size is also not that relevant, as long as it's big enough to exhaust whatever caches there are (this also depends on the amount of RAM a user has, but what I'm saying is there is generally no need to test with 100+ GiB torrents for this purpose).

However, I expect the differences between the more extreme scenarios (e.g. NVMe vs HDD) to be visible even with less rigorous testing.

xavier2k6 commented 4 years ago

but what I'm saying is there is generally no need to test with 100+ GiB torrents for this purpose

I wasn't suggesting to use torrents that size for testing purposes...it was just an example/a quickly edited table of what I had used from a previous issue/thread.

The table usage was only a suggestion for "summary purposes" so all the relevant gathered info could be in one place.

apexlir commented 4 years ago

This was a quick and dirty bench by no mean very scientific since I share the port and depend on 40 public seeders. CPU utilization was indeed up by 5%.

Did another run this time on a Soft RAID-0 of WD RED drives (btrfs) :

RUN1: ON RUN2: OFF RUN3: ON bench-red

I think we discard the first run but still a 6% drop in throughput between run2 and run3 :

red-enabled

red-disabled

red-enabled-2nd-run

CPU utilization was up by few percents but not very significant this time, curious to see the results on a standard line 100Mbs-1Gbps

yeezylife commented 2 years ago

I wonder does SSD benifits from piece_extent_affinity.

qbittorrent / qBittorrent

piece_extent_affinity benchmark thread #11873