Open jengelh opened 1 year ago
cross-checking expected baseline with dnf:
# dnf5 install --downloadonly $(cat ghclist)
# sync; time dnf5 install -y $(cat ghclist)
…
[158/158] Total 100% | 0.0 B/s | 0.0 B | 00m00s
Running transaction
[ 1/160] Verify package files 100% | 993.0 B/s | 158.0 B | 00m00s
[ 2/160] Prepare transaction 100% | 1.0 KiB/s | 158.0 B | 00m00s
[ 3/160] Installing ghc-base-0:4.17.2.0-1.2.x86_64 100% | 189.0 MiB/s | 19.3 MiB | 00m00s
…
[160/160] Installing ghc-utf8-string-0:1.0.2-2.10.x86_64 100% | 1.2 MiB/s | 296.6 KiB | 00m00s
real 0m4.599s
user 0m2.451s
sys 0m1.128s
# sync; time dnf5 remove -y 'ghc-*'
…
Transaction Summary:
Removing: 158 packages
After this operation 211 MiB will be freed (install 0 B, remove 211 MiB).
Running transaction
[ 1/159] Prepare transaction 100% | 1.0 KiB/s | 158.0 B | 00m00s
[ 2/159] Erasing ghc-commonmark-pandoc-0:0.2.1.3-2.15.x86_64 100% | 444.0 B/s | 4.0 B | 00m00s
…
[159/159] Erasing ghc-base-0:4.17.2.0-1.2.x86_64 100% | 108.0 B/s | 16.0 B | 00m00s
>>> Running post-uninstall scriptlet: ghc-base-0:4.17.2.0-1.2.x86_64
>>> Stop post-uninstall scriptlet: ghc-base-0:4.17.2.0-1.2.x86_64
real 0m2.587s
user 0m1.078s
sys 0m0.820s
Maybe you can provide us the zypper.log of the install and remove command. And if you're about to measure, it would be interesting if the numbers for ZYPP_SINGLE_RPMTRANS=1 zypper ...
also differ that much.
We are aware that there are several not install-related actions which consume extra time and where we can enhance. Most of them related to the repo autorefresh and cache building.
The traditional zypp backend and rpm/dnf are hard to compare, because the traditional backend forks and execs rpm for each single package. The SINGLE_RPMTRANS uses librpm to form a single transaction. The not-install related pre- and post-processing however is the same here.
ZYPP_SINGLE_RPMTRANS=1 provides the desirable execution time characteristics :+1: . Can this be made default?
Commands for obtaining timings should be shown above. Today I am getting a runtime of about 15s (probably my system was a bit loaded at the time of report). The exact package list of the 158 was:
ghc-Glob ghc-JuicyPixels ghc-OneTuple ghc-Only ghc-QuickCheck ghc-SHA ghc-StateVar ghc-aeson ghc-aeson-pretty ghc-ansi-terminal ghc-ansi-terminal-types ghc-appar ghc-array ghc-asn1-encoding ghc-asn1-parse ghc-asn1-types ghc-assoc ghc-async ghc-attoparsec ghc-base ghc-base-compat ghc-base-compat-batteries ghc-base-orphans ghc-base16-bytestring ghc-base64 ghc-base64-bytestring ghc-basement ghc-bifunctors ghc-binary ghc-bitvec ghc-blaze-builder ghc-blaze-html ghc-blaze-markup ghc-byteorder ghc-bytestring ghc-case-insensitive ghc-cassava ghc-cereal ghc-colour ghc-commonmark ghc-commonmark-extensions ghc-commonmark-pandoc ghc-comonad ghc-conduit ghc-conduit-extra ghc-connection ghc-containers ghc-contravariant ghc-cookie ghc-cryptonite ghc-data-default ghc-data-default-class ghc-data-default-instances-containers ghc-data-default-instances-dlist ghc-data-default-instances-old-locale ghc-data-fix ghc-deepseq ghc-digest ghc-directory ghc-distributive ghc-dlist ghc-doclayout ghc-doctemplates ghc-emojis ghc-exceptions ghc-file-embed ghc-filepath ghc-foldable1-classes-compat ghc-generically ghc-ghc-boot-th ghc-gridtables ghc-haddock-library ghc-hashable ghc-haskell-lexer ghc-hourglass ghc-http-client ghc-http-client-tls ghc-http-types ghc-indexed-traversable ghc-indexed-traversable-instances ghc-integer-logarithms ghc-iproute ghc-ipynb ghc-jira-wiki-markup ghc-libyaml ghc-memory ghc-mime-types ghc-mono-traversable ghc-mtl ghc-network ghc-network-uri ghc-old-locale ghc-ordered-containers ghc-pandoc-types ghc-parsec ghc-pem ghc-pretty ghc-pretty-show ghc-primitive ghc-process ghc-random ghc-regex-base ghc-regex-tdfa ghc-resourcet ghc-safe ghc-scientific ghc-semialign ghc-semigroupoids ghc-socks ghc-split ghc-splitmix ghc-stm ghc-streaming-commons ghc-strict ghc-syb ghc-tagged ghc-tagsoup ghc-template-haskell ghc-temporary ghc-texmath ghc-text ghc-text-conversions ghc-text-short ghc-th-abstraction ghc-th-compat ghc-th-lift ghc-th-lift-instances ghc-these ghc-time ghc-time-compat ghc-tls ghc-transformers ghc-transformers-compat ghc-typed-process ghc-typst-symbols ghc-unicode-collation ghc-unicode-data ghc-unicode-transforms ghc-uniplate ghc-unix ghc-unliftio-core ghc-unordered-containers ghc-utf8-string ghc-uuid-types ghc-vector ghc-vector-algorithms ghc-vector-stream ghc-witherable ghc-x509 ghc-x509-store ghc-x509-system ghc-x509-validation ghc-xml ghc-xml-conduit ghc-xml-types ghc-yaml ghc-zip-archive ghc-zlib
Were about to push this to become the new default.
Just FYI: the underlying problem is rpm's file conflict check. Most of our packages contain a 'COPYING' or 'LICENSE' file. When such a package is installed or erased, rpm needs to check every other package that also contains such a file. The different directories do not matter, as rpm needs to understand "aliased" directories, i.e. directory symlinks (an example is that "/bin -> /usr/bin" symlink).
So when zypper calls rpm for every transaction step, we get a O(N^2) time. When using a single transaction, rpm can optimize the check and we just have a O(N) time.
For everything else, it does not matter if a single transaction is used or not. It's really just the file conflict check and those license files...
How are we doing in regards to speeding zypper up here?
How are we doing in regards to speeding zypper up here?
There were some problems with rpm --root that had to be fixed first, thats why it is not yet default.
OS/Version: OpenSUSE Tumbleweed 20231030 x86_64, zypper-1.14.66-2.4, libzypp-17.31.23-14.2, libsolv-tools-0.7.25-1.2
Obtain a baseline:
Then exercise zypper. Yes, there's a bigger solver component in zypper than there is an rpm, but it's a fraction of the overall walltime.