puppetlabs / puppetdb

Centralized Puppet Storage
http://docs.puppetlabs.com/puppetdb
Apache License 2.0
298 stars 224 forks source link

Consider avoiding the "stable/volatile facts merge" in puppetdb SQL queries #3955

Closed rbrw closed 3 months ago

rbrw commented 4 months ago

In an attempt to decrease the steady-state Postgres write load PuppetDB currently stores each factset in two separate jsonb fragments, stable and volatile[^1]. Whenever a fact changes, the top-level subtree that contains it is moved from stable to volatile, if it wasn't already in volatile, and never moves it back. (See also: #3956)

When generating queries, PuppetDB just refers to (stable || volatile) via || which reconstructs the full factset, and (it turns out) can be notably expensive. During some scale testing, a simple inventory query effectively filtering on "somefact = x" was observed to take about five seconds with 100k nodes. During investigation, it looked like most of the time was being spent in the examination of the fact value. Rewriting the SQL from roughly (stable || volatile)->somefact to stable->somefact or volatile->somefact decreased the execution time to less than half a second. Charlie also determined via perf that much of the extra time for the original version was being spent in the stable/volatile merge.

Since the stable/volatile split may be worth preserving[^1], consider adjusting PuppetDB to work with the stable and volatile fragments directly (as in the "or" conversion above). This will also require the creation of independent stable and volatile indexes, and may or may not allow dropping the existing, combined (stable||volatile) expresion index.

[^1]: The stable/volatile split relies on an assumption that in typical installations there will be a substantial set of facts that never change. When true, those facts will end up in the stable jsonb factsets column, and if they're also large enough to be TOASTed then the value in the factset row should just be an integer TOAST table rowid, shrinking the size of the row with respect to future rewrites during factset updates. From the TOAST page linked above: "During an UPDATE operation, values of unchanged fields are normally preserved as-is; so an UPDATE of a row with out-of-line values incurs no TOAST costs if none of the out-of-line values change."

github-actions[bot] commented 4 months ago

Migrated issue to PDB-5739