Closed ppanon2022 closed 3 weeks ago
Definitely a problem with the errata cache bunch task. The history for Bunch errata-cache-bunch shows most tasks getting skipped, the status is INTERRUPTED for the ones that were running when the server needed to be rebooted for updates. The currently running errata-cache task has been running for 41992 seconds.
I temporarily disabled the schedule for that task (actually changed it to once a month), restarted taskomatic, did a pg_cancel_backend on the backlogged/queued errata job's connection, and the CPU usage dropped as expected with no restart. However I'm guessing that we won't get errata/patch information as a result and we do want that, so some help in identifying what's causing the problem would be appreciated.
Note the information in the discussion 8978. aaannz suggested running a vacuum analyze on a number of tables - ANALYZE VERBOSE rhnServer, rhnServerPackage, rhnPackageEvr, rhnPackageUpgradeArchCompat, rhnServerChannel, rhnChannelPackage, rhnChannelErrata, rhnErrataPackage, rhnErrata, rhnServerNeededCache;
and there were thousands of NOTICE warnings during the analysis of the rhnpackageevr table. If those mean that the index is effectively unusable, requiring a table scan, then that could totally trash performance.
INFO: analyzing "public.rhnserver" INFO: "rhnserver": scanned 649 of 649 pages, containing 941 live rows and 51 dead rows; 941 rows in sample, 941 estimated total rows INFO: analyzing "public.rhnserverpackage" INFO: "rhnserverpackage": scanned 20745 of 20745 pages, containing 1912211 live rows and 19432 dead rows; 30000 rows in sample, 1912211 estimated total rows INFO: analyzing "public.rhnpackageevr" INFO: "rhnpackageevr": scanned 1403 of 1403 pages, containing 104016 live rows and 27 dead rows; 30000 rows in sample, 104016 estimated total rows NOTICE: comparing incompatible evr types. Using rpm NOTICE: comparing incompatible evr types. Using rpm NOTICE: comparing incompatible evr types. Using deb NOTICE: comparing incompatible evr types. Using deb ... Skipping thousands of those NOTICE: comparing incompatible evr types lines ... INFO: analyzing "public.rhnpackageupgradearchcompat" INFO: "rhnpackageupgradearchcompat": scanned 1 of 1 pages, containing 127 live rows and 0 dead rows; 127 rows in sample, 127 estimated total rows INFO: analyzing "public.rhnserverchannel" INFO: "rhnserverchannel": scanned 51 of 51 pages, containing 5065 live rows and 171 dead rows; 5065 rows in sample, 5065 estimated total rows
This appears to have been fixed in 2024.07
Problem description
Postgresql processes are constantly hogging the CPU. Those processes are linked to the Taskomatic process with the com.redhat.rhn.manager.errata.cache.UpdateErrataCacheCommand method. See https://github.com/uyuni-project/uyuni/discussions/8978 for the investigation process followed
Steps to reproduce
Uyuni version
Uyuni proxy version (if used)
No response
Useful logs
Additional information
The excessive CPU load (and perhaps record locking had caused an LCM project promotion to fail. Attempting to restart the project promotion also seems to fail.