rpm-software-management / createrepo_c

C implementation of the createrepo.
http://rpm-software-management.github.io/createrepo_c
GNU General Public License v2.0
100 stars 93 forks source link

Set compression level for zstd to level 10 #396

Closed dirkmueller closed 12 months ago

dirkmueller commented 12 months ago

This requires a good amount of saving due to larger dictionary size with only marginal more time effort for compression. decompression time is unaffected.

dralley commented 11 months ago

An 18% compute tradeoff for a ~1% improvement might make sense for repos which are frequently downloaded, but the tradeoff is questionable for repos that will only be used a couple dozen, hundred or perhaps thousand times.

@praiskup Do you have an opinion on this? Would you classify it as positive, negative or basically irrelevant for a service like COPR?

praiskup commented 11 months ago

In Copr the "production vs. consumption" ratio is much higher compared to normal distributions, and the compression performance matters. So as long as we could opt-out I think it is a more or less neutral proposal, but without an option this is negative. I'm not sure what would be the effect of this after moving Copr repos to PULP.

dirkmueller commented 11 months ago

@praiskup without further patches, overall createrepo_c is I/O bound, not cpu bound. the shift from 9 to 10 is merely 20% increase which is well in the noise taking I/O overhead into account. I've benchmarked this extensively for Open Build Service usecases which is similar to COPR, and it is neutral.

praiskup commented 11 months ago

Copr's createrepo_c runs are (very often) CPU bound. We use --recycle-pkglist and similar options most of the time to minimize I/O when re-generating repositories.

dirkmueller commented 11 months ago

@praiskup understood, we're doing the same. still profiling indicated that the compression only adds 30% of the total runtime. anyway, if you feel strongly, I can make it an option, and then I would even be able to go to higher compression levels.

dirkmueller commented 11 months ago

@praiskup just let me know if an extra option for 5% runtime (20% of 30%) is needed, and I'll do a PR

praiskup commented 11 months ago

I do not recall all the details from https://github.com/rpm-software-management/createrepo_c/pull/191 now, though you might be right that 70% is XML dumping, and 30% compression. But I/O is often more or less zero (adding one package into a repository with 100k RPMs and more).

Having an option would be nice. I missed that this is already merged; then others probably think it is worth it, let's deal with the option later. We'll report back.