rpm-software-management / librepo

A library providing C and Python (libcURL like) API for downloading packages and linux repository metadata in rpm-md format
http://rpm-software-management.github.io/librepo/
GNU Lesser General Public License v2.1
74 stars 90 forks source link

Increase checksum buffer to 128kb, improving download performance. #295

Open stewartsmith opened 5 months ago

stewartsmith commented 5 months ago

Reading 2kb at a time to compute the checksum limits network throughput. Bumping up to 128kb seems to give a good balance of memory usage and performance.

Benchmarks done on a m5n.16xlarge EC2 instance doing a reposync on the Amazon Linux 2023 x86-64 repositories showed that this change, when combined with the (smaller) benefits of my avoiding libc IO patch, reduce system CPU time by another half second, and cut a further 3 seconds off total time:

102s (original) -> 99 (no libc buffered io) -> 95s (this patch)

stewartsmith commented 5 months ago

For reference, my benchmarking has been done on a m5n.16xlarge EC2 instance to the in-region S3 buckets as well as to the CDN repositories. That instance type has 256GB memory, a 75Gbit network connection, and is a 64 core Cascade Lake system. The root volume is a 256GB gp3 EBS volume with 500MB/sec of IO and 3000 IOPs.

The background of this is that a lot of EC2 instances don't live that long (relatively speaking), and never install RPMs except on launch - so all the time-to-install RPMs is time spent scaling up a system that could be better served by running the customer workload.

Goes well when paired with https://github.com/rpm-software-management/librepo/pull/294