puppetlabs / puppetdb

Centralized Puppet Storage
http://docs.puppetlabs.com/puppetdb
Apache License 2.0
299 stars 225 forks source link

(PE-37294) Remove duplicate storage of catalog resource parameters #3922

Closed austb closed 8 months ago

austb commented 9 months ago

Resolves #3923 Resolves #3924

austb commented 8 months ago

This is of marginal benefit, processing catalogs that took roughly 200ms each, this saved less than 5ms per catalog. So this would be expected to be no more than a 1-2% improvement on catalog storage times. It also appeared to cause some performance regressions in querying

In the initial storage, most of the time is spent inserting catalog resources, not parameters. At ~100ms it does not seem abnormally slow for an insert to Postgres and we do appear to be properly constructing our batched insertions, but we should look into optimization options there.

After initial storage, about 25% of the time (in my testing approx. 25ms of 95ms) is spent calculating the catalog-similarity-hash to determine if the catalog hash changed. In this run rand-perc was set to 100%, so there was always an update and another ~12ms was spent calculating the resource hashes. The remaining time is spent: ~10ms updating metadata ~45ms updating resources which itself is mostly a set of <10ms operations. Given that most of the time catalogs do not change, improving the catalog similarity hash would have the best overall performance improvement in the steady-state.