puppetlabs / puppetdb

Centralized Puppet Storage
http://docs.puppetlabs.com/puppetdb
Apache License 2.0
299 stars 225 forks source link

(maint) Sync initial-hosts on benchmark re-execution #3918

Closed jpartlow closed 9 months ago

jpartlow commented 9 months ago

When benchmark starts, it generates a list of initial host-maps based on random selections from the catalog/factset/report sample data. This means that every time benchmark is run a new set of catalogs and facts are going to be pushed to PuppetDB. This can cause a great deal of initial load replacing the catalogs and facts wholesale until PuppetDB has caught up to processing one entire simulated node interval.

This makes it impractical to stop and restart benchmark when working with large simulated runs.

This patch attempts to improve this be encoding the certname of the original sample file in the catalog version string. We then query catalogs certname, version fields, and form an index of certname -> original-certname. This index is then used to ensure that the set of initial hostmaps is regenerated from the same base catalog and facteset so there should not be excess churn when benchmark begins pushing commands.

jpartlow commented 9 months ago

This is working in limited local testing and is probably a more generically reliable approach. Main downsides compared to #3917 are that it's more convoluted, and I don't know yet if it has scale issues (querying large catalog sets). I haven't touched the tests yet. I'm going to do some more testing on an ost instance to try and tease it if there's a performance problem. It could probably stand to be beaten with a clojure best practices cluebat.

jpartlow commented 9 months ago

The query to get the certname/version index does not take significant time, even with 200,000 nodes in puppetdb.

However, I'm still seeing queue spikes restarting benchmarks with an xl with 2 compilers and 100,000 simulated nodes. Took about 2 hrs for my installation to work through the churn.

Austin pointed out that when the previous steady state simulation has been running for days with --rand-perc 100, small catalog changes will have accumulated and even with the sync, there will be significant catalog differences for the system to correct on restart.

jpartlow commented 9 months ago

Since we can't deal with accumulated changes to the catalogs/facts during the tenure of the previous run without extensive querying of the database, closing in favor of a reworked #3917 that can preserve all files in the TempFileBuffer.