openSUSE / zypper

World's most powerful command line package manager
http://en.opensuse.org/Portal:Zypper
Other
412 stars 113 forks source link

zypper is really slow #513

Open jengelh opened 1 year ago

jengelh commented 1 year ago

OS/Version: OpenSUSE Tumbleweed 20231030 x86_64, zypper-1.14.66-2.4, libzypp-17.31.23-14.2, libsolv-tools-0.7.25-1.2

Obtain a baseline:

# zypper --non-interactive in --download-only $(cat ghclist)
# time rpm -Uhv /var/cache/zypp/packages/base/x86_64/ghc-*.rpm
…
 157:ghc-ordered-containers-0.2.3-1.7 ################################# [ 99%]
 158:ghc-utf8-string-1.0.2-2.10       ################################# [100%]

real    0m2.694s
user    0m1.119s
sys     0m0.672s
# sync; time rpm -e $(rpm -qa 'ghc-*')
real    0m2.933s
user    0m1.482s
sys     0m0.636s

Then exercise zypper. Yes, there's a bigger solver component in zypper than there is an rpm, but it's a fraction of the overall walltime.

# time zypper --non-interactive in $(cat ghclist)
Loading repository data...
Reading installed packages...
Resolving package dependencies...

The following 158 NEW packages are going to be installed:
  ghc-Glob ghc-JuicyPixels ghc-OneTuple ghc-Only ghc-QuickCheck ghc-SHA ghc-StateVar ghc-aeson ghc-aeson-pretty
  ghc-ansi-terminal ghc-ansi-terminal-types ghc-appar ghc-array ghc-asn1-encoding ghc-asn1-parse ghc-asn1-types ghc-assoc
  ghc-async ghc-attoparsec ghc-base ghc-base-compat ghc-base-compat-batteries ghc-base-orphans ghc-base16-bytestring ghc-base64
  ghc-base64-bytestring ghc-basement ghc-bifunctors ghc-binary ghc-bitvec ghc-blaze-builder ghc-blaze-html ghc-blaze-markup
  ghc-byteorder ghc-bytestring ghc-case-insensitive ghc-cassava ghc-cereal ghc-colour ghc-commonmark ghc-commonmark-extensions
  ghc-commonmark-pandoc ghc-comonad ghc-conduit ghc-conduit-extra ghc-connection ghc-containers ghc-contravariant ghc-cookie
  ghc-cryptonite ghc-data-default ghc-data-default-class ghc-data-default-instances-containers ghc-data-default-instances-dlist
  ghc-data-default-instances-old-locale ghc-data-fix ghc-deepseq ghc-digest ghc-directory ghc-distributive ghc-dlist
  ghc-doclayout ghc-doctemplates ghc-emojis ghc-exceptions ghc-file-embed ghc-filepath ghc-foldable1-classes-compat
  ghc-generically ghc-ghc-boot-th ghc-gridtables ghc-haddock-library ghc-hashable ghc-haskell-lexer ghc-hourglass
  ghc-http-client ghc-http-client-tls ghc-http-types ghc-indexed-traversable ghc-indexed-traversable-instances
  ghc-integer-logarithms ghc-iproute ghc-ipynb ghc-jira-wiki-markup ghc-libyaml ghc-memory ghc-mime-types ghc-mono-traversable
  ghc-mtl ghc-network ghc-network-uri ghc-old-locale ghc-ordered-containers ghc-pandoc-types ghc-parsec ghc-pem ghc-pretty
  ghc-pretty-show ghc-primitive ghc-process ghc-random ghc-regex-base ghc-regex-tdfa ghc-resourcet ghc-safe ghc-scientific
  ghc-semialign ghc-semigroupoids ghc-socks ghc-split ghc-splitmix ghc-stm ghc-streaming-commons ghc-strict ghc-syb ghc-tagged
  ghc-tagsoup ghc-template-haskell ghc-temporary ghc-texmath ghc-text ghc-text-conversions ghc-text-short ghc-th-abstraction
  ghc-th-compat ghc-th-lift ghc-th-lift-instances ghc-these ghc-time ghc-time-compat ghc-tls ghc-transformers
  ghc-transformers-compat ghc-typed-process ghc-typst-symbols ghc-unicode-collation ghc-unicode-data ghc-unicode-transforms
  ghc-uniplate ghc-unix ghc-unliftio-core ghc-unordered-containers ghc-utf8-string ghc-uuid-types ghc-vector
  ghc-vector-algorithms ghc-vector-stream ghc-witherable ghc-x509 ghc-x509-store ghc-x509-system ghc-x509-validation ghc-xml
  ghc-xml-conduit ghc-xml-types ghc-yaml ghc-zip-archive ghc-zlib

158 new packages to install.
Overall download size: 0 B. Already cached: 31.4 MiB. After the operation, additional 211.3 MiB will be used.
Continue? [y/n/v/...? shows all options] (y): y
In cache ghc-base-4.17.2.0-1.2.x86_64.rpm                                                                (1/158),   3.1 MiB    
…
(158/158) Installing: ghc-http-client-tls-0.3.6.1-2.10.x86_64 ...........................................................[done]
Running post-transaction scripts ........................................................................................[done]
real    0m39.364s
user    0m19.816s
sys     0m17.949s

# time zypper --non-interactive rm 'ghc-*'
…
(158/158) Removing ghc-base-4.17.2.0-1.2.x86_64 .........................................................................[done]
Running post-transaction scripts ........................................................................................[done]
There are running programs which still use files and libraries deleted or updated by recent upgrades. They should be restarted to benefit from the latest updates. Run 'zypper ps -s' to list these programs.
real    0m30.475s
user    0m16.013s
sys     0m13.089s
jengelh commented 1 year ago

cross-checking expected baseline with dnf:

# dnf5 install --downloadonly $(cat ghclist)
# sync; time dnf5 install -y $(cat ghclist)
…
[158/158] Total                                                                        100% |   0.0   B/s |   0.0   B |  00m00s
Running transaction
[  1/160] Verify package files                                                         100% | 993.0   B/s | 158.0   B |  00m00s
[  2/160] Prepare transaction                                                          100% |   1.0 KiB/s | 158.0   B |  00m00s
[  3/160] Installing ghc-base-0:4.17.2.0-1.2.x86_64                                    100% | 189.0 MiB/s |  19.3 MiB |  00m00s
…
[160/160] Installing ghc-utf8-string-0:1.0.2-2.10.x86_64                               100% |   1.2 MiB/s | 296.6 KiB |  00m00s

real    0m4.599s
user    0m2.451s
sys     0m1.128s

# sync; time dnf5 remove -y 'ghc-*'
…
Transaction Summary:
 Removing:        158 packages

After this operation 211 MiB will be freed (install 0 B, remove 211 MiB).

Running transaction
[  1/159] Prepare transaction                                                          100% |   1.0 KiB/s | 158.0   B |  00m00s
[  2/159] Erasing ghc-commonmark-pandoc-0:0.2.1.3-2.15.x86_64                          100% | 444.0   B/s |   4.0   B |  00m00s
…
[159/159] Erasing ghc-base-0:4.17.2.0-1.2.x86_64                                       100% | 108.0   B/s |  16.0   B |  00m00s
>>> Running post-uninstall scriptlet: ghc-base-0:4.17.2.0-1.2.x86_64
>>> Stop post-uninstall scriptlet: ghc-base-0:4.17.2.0-1.2.x86_64

real    0m2.587s
user    0m1.078s
sys     0m0.820s
mlandres commented 1 year ago

Maybe you can provide us the zypper.log of the install and remove command. And if you're about to measure, it would be interesting if the numbers for ZYPP_SINGLE_RPMTRANS=1 zypper ... also differ that much. We are aware that there are several not install-related actions which consume extra time and where we can enhance. Most of them related to the repo autorefresh and cache building. The traditional zypp backend and rpm/dnf are hard to compare, because the traditional backend forks and execs rpm for each single package. The SINGLE_RPMTRANS uses librpm to form a single transaction. The not-install related pre- and post-processing however is the same here.

jengelh commented 1 year ago

ZYPP_SINGLE_RPMTRANS=1 provides the desirable execution time characteristics :+1: . Can this be made default?

Commands for obtaining timings should be shown above. Today I am getting a runtime of about 15s (probably my system was a bit loaded at the time of report). The exact package list of the 158 was:

ghc-Glob ghc-JuicyPixels ghc-OneTuple ghc-Only ghc-QuickCheck ghc-SHA ghc-StateVar ghc-aeson ghc-aeson-pretty ghc-ansi-terminal ghc-ansi-terminal-types ghc-appar ghc-array ghc-asn1-encoding ghc-asn1-parse ghc-asn1-types ghc-assoc ghc-async ghc-attoparsec ghc-base ghc-base-compat ghc-base-compat-batteries ghc-base-orphans ghc-base16-bytestring ghc-base64 ghc-base64-bytestring ghc-basement ghc-bifunctors ghc-binary ghc-bitvec ghc-blaze-builder ghc-blaze-html ghc-blaze-markup ghc-byteorder ghc-bytestring ghc-case-insensitive ghc-cassava ghc-cereal ghc-colour ghc-commonmark ghc-commonmark-extensions ghc-commonmark-pandoc ghc-comonad ghc-conduit ghc-conduit-extra ghc-connection ghc-containers ghc-contravariant ghc-cookie ghc-cryptonite ghc-data-default ghc-data-default-class ghc-data-default-instances-containers ghc-data-default-instances-dlist ghc-data-default-instances-old-locale ghc-data-fix ghc-deepseq ghc-digest ghc-directory ghc-distributive ghc-dlist ghc-doclayout ghc-doctemplates ghc-emojis ghc-exceptions ghc-file-embed ghc-filepath ghc-foldable1-classes-compat ghc-generically ghc-ghc-boot-th ghc-gridtables ghc-haddock-library ghc-hashable ghc-haskell-lexer ghc-hourglass ghc-http-client ghc-http-client-tls ghc-http-types ghc-indexed-traversable ghc-indexed-traversable-instances ghc-integer-logarithms ghc-iproute ghc-ipynb ghc-jira-wiki-markup ghc-libyaml ghc-memory ghc-mime-types ghc-mono-traversable ghc-mtl ghc-network ghc-network-uri ghc-old-locale ghc-ordered-containers ghc-pandoc-types ghc-parsec ghc-pem ghc-pretty ghc-pretty-show ghc-primitive ghc-process ghc-random ghc-regex-base ghc-regex-tdfa ghc-resourcet ghc-safe ghc-scientific ghc-semialign ghc-semigroupoids ghc-socks ghc-split ghc-splitmix ghc-stm ghc-streaming-commons ghc-strict ghc-syb ghc-tagged ghc-tagsoup ghc-template-haskell ghc-temporary ghc-texmath ghc-text ghc-text-conversions ghc-text-short ghc-th-abstraction ghc-th-compat ghc-th-lift ghc-th-lift-instances ghc-these ghc-time ghc-time-compat ghc-tls ghc-transformers ghc-transformers-compat ghc-typed-process ghc-typst-symbols ghc-unicode-collation ghc-unicode-data ghc-unicode-transforms ghc-uniplate ghc-unix ghc-unliftio-core ghc-unordered-containers ghc-utf8-string ghc-uuid-types ghc-vector ghc-vector-algorithms ghc-vector-stream ghc-witherable ghc-x509 ghc-x509-store ghc-x509-system ghc-x509-validation ghc-xml ghc-xml-conduit ghc-xml-types ghc-yaml ghc-zip-archive ghc-zlib

mlandres commented 1 year ago

Were about to push this to become the new default.

mlschroe commented 1 year ago

Just FYI: the underlying problem is rpm's file conflict check. Most of our packages contain a 'COPYING' or 'LICENSE' file. When such a package is installed or erased, rpm needs to check every other package that also contains such a file. The different directories do not matter, as rpm needs to understand "aliased" directories, i.e. directory symlinks (an example is that "/bin -> /usr/bin" symlink).

So when zypper calls rpm for every transaction step, we get a O(N^2) time. When using a single transaction, rpm can optimize the check and we just have a O(N) time.

For everything else, it does not matter if a single transaction is used or not. It's really just the file conflict check and those license files...

dirkmueller commented 8 months ago

How are we doing in regards to speeding zypper up here?

bzeller commented 8 months ago

How are we doing in regards to speeding zypper up here?

There were some problems with rpm --root that had to be fixed first, thats why it is not yet default.