Open sharewax opened 6 months ago
Some metrics from puppetdb: Singapure:
# curl -s -G -H 'Accept: application/json' --cacert /etc/puppetlabs/puppet/ssl/certs/ca.pem --cert /etc/puppetlabs/puppet/ssl/certs/$HOSTNAME.pem --key /etc/puppetlabs/puppet/ssl/private_keys/$HOSTNAME.pem https://$HOSTNAME:8081/metrics/v2/read/puppetlabs.puppetdb.database:name=PDBWritePool.pool.Wait | jq .
{
"request": {
"mbean": "puppetlabs.puppetdb.database:name=PDBWritePool.pool.Wait",
"type": "read"
},
"value": {
"75thPercentile": 183.023337,
"Mean": 124.2630601327565,
"StdDev": 85.63176999191273,
"98thPercentile": 185.45835,
"RateUnit": "events/second",
"95thPercentile": 184.71381,
"99thPercentile": 187.21545799999998,
"Max": 190.855177,
"Count": 220904,
"FiveMinuteRate": 0.8528680707338815,
"50thPercentile": 182.96111299999998,
"MeanRate": 0.7860066475101025,
"Min": 0.0010249999999999999,
"OneMinuteRate": 0.9908548663105827,
"DurationUnit": "milliseconds",
"999thPercentile": 187.21545799999998,
"FifteenMinuteRate": 0.7954050920023567
},
"timestamp": 1715284731,
"status": 200
}
EU (local):
# curl -s -G -H 'Accept: application/json' --cacert /etc/puppetlabs/puppet/ssl/certs/ca.pem --cert /etc/puppetlabs/puppet/ssl/certs/$HOSTNAME.pem --key /etc/puppetlabs/puppet/ssl/private_keys/$HOSTNAME.pem https://$HOSTNAME:8081/metrics/v2/read/puppetlabs.puppetdb.database:name=PDBWritePool.pool.Wait | jq .
{
"request": {
"mbean": "puppetlabs.puppetdb.database:name=PDBWritePool.pool.Wait",
"type": "read"
},
"value": {
"75thPercentile": 0.29408,
"Mean": 0.13560441251423003,
"StdDev": 0.24515121048637722,
"98thPercentile": 0.431164,
"RateUnit": "events/second",
"95thPercentile": 0.39854,
"99thPercentile": 0.451405,
"Max": 8.078914,
"Count": 457080,
"FiveMinuteRate": 3.2667841655103405,
"50thPercentile": 0.0018239999999999999,
"MeanRate": 3.332225393092325,
"Min": 0.00048199999999999995,
"OneMinuteRate": 3.1529805478178483,
"DurationUnit": "milliseconds",
"999thPercentile": 2.589778,
"FifteenMinuteRate": 3.3267313956147135
},
"timestamp": 1715284792,
"status": 200
}
What Postgres version are you using?
15.6
Ok, that's a new enough version that you are getting the best behavior we have (PG 14 or greater).
Some background: GC of reports is very fast now that we have partitioned tables because we can drop whole tables at a time, but the database locks it acquires to do that are more blocking than deleting a row. The result of this is that once PuppetDB's GC has started to wait for a lock, we can block both commands and queries that come in afterwards until both the in-flight queries and the GC are complete. Since we don't know how long all in-flight user queries will take, we could be blocking future commands and queries indefinitely. To solve that we have chosen to rely on this "bulldozer" that allows us to get the heavily locking GC work done as quickly as possible.
Both queries and command processing automatically retry failed PG operations, so just because the error shows up in the PG log does not mean that an attempt to query actually failed. If the user query fails, we would consider that a bug, and you should also see the error logged in PuppetDB's logs then.
Yes, on Singapure puppetdb I see the next things
zcat puppetdb-2024-05-09.0.log.gz | rg FATAL -A 200
org.postgresql.util.PSQLException: FATAL: terminating connection due to administrator command
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:356)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:496)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:413)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:190)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:134)
at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
at clojure.java.jdbc$execute_query_with_params.invokeStatic(jdbc.clj:1090)
at clojure.java.jdbc$execute_query_with_params.invoke(jdbc.clj:1084)
at clojure.java.jdbc$db_query_with_resultset_STAR_.invokeStatic(jdbc.clj:1106)
at clojure.java.jdbc$db_query_with_resultset_STAR_.invoke(jdbc.clj:1093)
at clojure.java.jdbc$db_query_with_resultset.invokeStatic(jdbc.clj:1140)
at clojure.java.jdbc$db_query_with_resultset.invoke(jdbc.clj:1119)
at clojure.java.jdbc$db_query_with_resultset.invokeStatic(jdbc.clj:1127)
at clojure.java.jdbc$db_query_with_resultset.invoke(jdbc.clj:1119)
at puppetlabs.puppetdb.jdbc$query_with_resultset.invokeStatic(jdbc.clj:236)
at puppetlabs.puppetdb.jdbc$query_with_resultset.invoke(jdbc.clj:227)
at puppetlabs.puppetdb.scf.storage$fn__27708$certname_factset_metadata__27713$fn__27714.invoke(storage.clj:1172)
at puppetlabs.puppetdb.scf.storage$fn__27708$certname_factset_metadata__27713.invoke(storage.clj:1165)
at puppetlabs.puppetdb.scf.storage$fn__27782$update_facts_BANG___27787$fn__27791$fn__27793.invoke(storage.clj:1268)
at clojure.java.jdbc$db_transaction_STAR_.invokeStatic(jdbc.clj:860)
at clojure.java.jdbc$db_transaction_STAR_.invoke(jdbc.clj:776)
at puppetlabs.puppetdb.scf.storage$fn__27782$update_facts_BANG___27787$fn__27791.invoke(storage.clj:1266)
at puppetlabs.puppetdb.scf.storage$fn__27782$update_facts_BANG___27787.invoke(storage.clj:1260)
at puppetlabs.puppetdb.scf.storage$fn__28110$replace_facts_BANG___28115$fn__28119$fn__28121.invoke(storage.clj:1965)
at puppetlabs.puppetdb.scf.storage.proxy$java.lang.Object$Callable$7da976d4.call(Unknown Source)
at com.codahale.metrics.Timer.time(Timer.java:101)
at puppetlabs.puppetdb.scf.storage$fn__28110$replace_facts_BANG___28115$fn__28119.invoke(storage.clj:1962)
at puppetlabs.puppetdb.scf.storage$fn__28110$replace_facts_BANG___28115.invoke(storage.clj:1955)
at puppetlabs.puppetdb.command$do_replace_facts.invokeStatic(command.clj:401)
at puppetlabs.puppetdb.command$do_replace_facts.invoke(command.clj:399)
at puppetlabs.puppetdb.command$exec_replace_facts$fn__32666.invoke(command.clj:409)
at puppetlabs.puppetdb.jdbc$retry_with_monitored_connection$fn__22104$fn__22117.invoke(jdbc.clj:553)
at clojure.java.jdbc$db_transaction_STAR_.invokeStatic(jdbc.clj:807)
at clojure.java.jdbc$db_transaction_STAR_.invoke(jdbc.clj:776)
at puppetlabs.puppetdb.jdbc$retry_with_monitored_connection$fn__22104.invoke(jdbc.clj:549)
at puppetlabs.puppetdb.jdbc$retry_sql$attempt__22084.invoke(jdbc.clj:497)
at puppetlabs.puppetdb.jdbc$retry_sql.invokeStatic(jdbc.clj:507)
at puppetlabs.puppetdb.jdbc$retry_sql.invoke(jdbc.clj:488)
at puppetlabs.puppetdb.jdbc$retry_with_monitored_connection.invokeStatic(jdbc.clj:546)
at puppetlabs.puppetdb.jdbc$retry_with_monitored_connection.invoke(jdbc.clj:541)
at puppetlabs.puppetdb.command$exec_replace_facts.invokeStatic(command.clj:406)
at puppetlabs.puppetdb.command$exec_replace_facts.invoke(command.clj:403)
at puppetlabs.puppetdb.command$exec_command.invokeStatic(command.clj:525)
at puppetlabs.puppetdb.command$exec_command.invoke(command.clj:510)
at puppetlabs.puppetdb.command$attempt_exec_command$fn__32843.invoke(command.clj:611)
at puppetlabs.puppetdb.command$call_with_quick_retry$fn__32832.invoke(command.clj:591)
at puppetlabs.puppetdb.command$call_with_quick_retry.invokeStatic(command.clj:589)
at puppetlabs.puppetdb.command$call_with_quick_retry.invoke(command.clj:587)
at puppetlabs.puppetdb.command$attempt_exec_command.invokeStatic(command.clj:609)
at puppetlabs.puppetdb.command$attempt_exec_command.invoke(command.clj:605)
at puppetlabs.puppetdb.command$process_cmd$fn__32899$fn__32900.invoke(command.clj:732)
at puppetlabs.puppetdb.command$process_cmd$fn__32899.invoke(command.clj:729)
at puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26316$fn__26317$fn__26318.invoke(metrics.clj:23)
at puppetlabs.puppetdb.utils.metrics.proxy$java.lang.Object$Callable$7da976d4.call(Unknown Source)
at com.codahale.metrics.Timer.time(Timer.java:101)
at puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26316$fn__26317.invoke(metrics.clj:23)
at puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26316$fn__26317$fn__26318.invoke(metrics.clj:23)
at puppetlabs.puppetdb.utils.metrics.proxy$java.lang.Object$Callable$7da976d4.call(Unknown Source)
at com.codahale.metrics.Timer.time(Timer.java:101)
at puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26316$fn__26317.invoke(metrics.clj:23)
at puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_.invokeStatic(metrics.clj:26)
at puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_.invoke(metrics.clj:15)
at puppetlabs.puppetdb.command$process_cmd.invokeStatic(command.clj:726)
at puppetlabs.puppetdb.command$process_cmd.invoke(command.clj:712)
at puppetlabs.puppetdb.command$process_message.invokeStatic(command.clj:819)
at puppetlabs.puppetdb.command$process_message.invoke(command.clj:775)
at puppetlabs.puppetdb.command$message_handler$fn__32933.invoke(command.clj:853)
at puppetlabs.puppetdb.threadpool$dochan$fn__32514$fn__32515.invoke(threadpool.clj:115)
at puppetlabs.puppetdb.threadpool$gated_execute$fn__32475.invoke(threadpool.clj:68)
at clojure.lang.AFn.run(AFn.java:22)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
2024-05-09T05:53:44.412Z WARN [c.z.h.p.ProxyConnection] PDBWritePool - Connection org.postgresql.jdbc.PgConnection@2ecb6f29 marked as broken because of SQLSTATE(57P01), ErrorCode(0)
java.sql.BatchUpdateException: Batch entry 0 insert into certnames (certname) values ('sg2-sl-b60.be.core.pw') on conflict (certname) do update set deactivated=null, expired=null where (certnames.deactivated < '2024-05-09 05:51:48.896+00' or certnames.expired < '2024-05-09 05:51:48.896+00') was aborted: FATAL: terminating connection due to administrator command
Where: while locking tuple (33,96) in relation "certnames" Call getNextException to see other errors in the batch.
at org.postgresql.jdbc.BatchResultHandler.handleError(BatchResultHandler.java:165)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2367)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:560)
at org.postgresql.jdbc.PgStatement.internalExecuteBatch(PgStatement.java:893)
at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:916)
at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1684)
at com.zaxxer.hikari.pool.ProxyStatement.executeBatch(ProxyStatement.java:128)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeBatch(HikariProxyPreparedStatement.java)
at clojure.java.jdbc$execute_batch.invokeStatic(jdbc.clj:598)
at clojure.java.jdbc$execute_batch.invoke(jdbc.clj:591)
at clojure.java.jdbc$db_do_execute_prepared_statement$fn__20458.invoke(jdbc.clj:1057)
at clojure.java.jdbc$db_transaction_STAR_.invokeStatic(jdbc.clj:860)
at clojure.java.jdbc$db_transaction_STAR_.invoke(jdbc.clj:776)
at clojure.java.jdbc$db_transaction_STAR_.invokeStatic(jdbc.clj:789)
at clojure.java.jdbc$db_transaction_STAR_.invoke(jdbc.clj:776)
at clojure.java.jdbc$db_do_execute_prepared_statement.invokeStatic(jdbc.clj:1056)
at clojure.java.jdbc$db_do_execute_prepared_statement.invoke(jdbc.clj:1042)
at clojure.java.jdbc$db_do_prepared.invokeStatic(jdbc.clj:1080)
at clojure.java.jdbc$db_do_prepared.invoke(jdbc.clj:1060)
at puppetlabs.puppetdb.jdbc$do_prepared.invokeStatic(jdbc.clj:72)
at puppetlabs.puppetdb.jdbc$do_prepared.doInvoke(jdbc.clj:67)
at clojure.lang.RestFn.invoke(RestFn.java:423)
at puppetlabs.puppetdb.scf.storage$fn__28023$maybe_activate_node_BANG___28028$fn__28029.invoke(storage.clj:1898)
at puppetlabs.puppetdb.scf.storage$fn__28023$maybe_activate_node_BANG___28028.invoke(storage.clj:1890)
at puppetlabs.puppetdb.command$do_replace_catalog.invokeStatic(command.clj:344)
at puppetlabs.puppetdb.command$do_replace_catalog.invoke(command.clj:343)
at puppetlabs.puppetdb.command$exec_replace_catalog$fn__32626.invoke(command.clj:353)
at puppetlabs.puppetdb.jdbc$retry_with_monitored_connection$fn__22104$fn__22117.invoke(jdbc.clj:553)
at clojure.java.jdbc$db_transaction_STAR_.invokeStatic(jdbc.clj:807)
at clojure.java.jdbc$db_transaction_STAR_.invoke(jdbc.clj:776)
at puppetlabs.puppetdb.jdbc$retry_with_monitored_connection$fn__22104.invoke(jdbc.clj:549)
at puppetlabs.puppetdb.jdbc$retry_sql$attempt__22084.invoke(jdbc.clj:497)
at puppetlabs.puppetdb.jdbc$retry_sql.invokeStatic(jdbc.clj:507)
at puppetlabs.puppetdb.jdbc$retry_sql.invoke(jdbc.clj:488)
at puppetlabs.puppetdb.jdbc$retry_with_monitored_connection.invokeStatic(jdbc.clj:546)
at puppetlabs.puppetdb.jdbc$retry_with_monitored_connection.invoke(jdbc.clj:541)
at puppetlabs.puppetdb.command$exec_replace_catalog.invokeStatic(command.clj:350)
at puppetlabs.puppetdb.command$exec_replace_catalog.invoke(command.clj:347)
at puppetlabs.puppetdb.command$exec_command.invokeStatic(command.clj:525)
at puppetlabs.puppetdb.command$exec_command.invoke(command.clj:510)
at puppetlabs.puppetdb.command$attempt_exec_command$fn__32843.invoke(command.clj:611)
at puppetlabs.puppetdb.command$call_with_quick_retry$fn__32832.invoke(command.clj:591)
at puppetlabs.puppetdb.command$call_with_quick_retry.invokeStatic(command.clj:589)
at puppetlabs.puppetdb.command$call_with_quick_retry.invoke(command.clj:587)
at puppetlabs.puppetdb.command$attempt_exec_command.invokeStatic(command.clj:609)
at puppetlabs.puppetdb.command$attempt_exec_command.invoke(command.clj:605)
at puppetlabs.puppetdb.command$process_cmd$fn__32899$fn__32900.invoke(command.clj:732)
at puppetlabs.puppetdb.command$process_cmd$fn__32899.invoke(command.clj:729)
at puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26316$fn__26317$fn__26318.invoke(metrics.clj:23)
at puppetlabs.puppetdb.utils.metrics.proxy$java.lang.Object$Callable$7da976d4.call(Unknown Source)
at com.codahale.metrics.Timer.time(Timer.java:101)
at puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26316$fn__26317.invoke(metrics.clj:23)
at puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26316$fn__26317$fn__26318.invoke(metrics.clj:23)
at puppetlabs.puppetdb.utils.metrics.proxy$java.lang.Object$Callable$7da976d4.call(Unknown Source)
at com.codahale.metrics.Timer.time(Timer.java:101)
at puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26316$fn__26317.invoke(metrics.clj:23)
at puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_.invokeStatic(metrics.clj:26)
at puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_.invoke(metrics.clj:15)
at puppetlabs.puppetdb.command$process_cmd.invokeStatic(command.clj:726)
at puppetlabs.puppetdb.command$process_cmd.invoke(command.clj:712)
at puppetlabs.puppetdb.command$process_message.invokeStatic(command.clj:819)
at puppetlabs.puppetdb.command$process_message.invoke(command.clj:775)
at puppetlabs.puppetdb.command$message_handler$fn__32933.invoke(command.clj:853)
at puppetlabs.puppetdb.threadpool$dochan$fn__32514$fn__32515.invoke(threadpool.clj:115)
at puppetlabs.puppetdb.threadpool$gated_execute$fn__32475.invoke(threadpool.clj:68)
at clojure.lang.AFn.run(AFn.java:22)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.postgresql.util.PSQLException: FATAL: terminating connection due to administrator command
Where: while locking tuple (33,96) in relation "certnames"
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366)
... 67 common frames omitted
2024-05-09T05:53:44.617Z WARN [c.z.h.p.ProxyConnection] PDBWritePool - Connection org.postgresql.jdbc.PgConnection@78c2b0c0 marked as broken because of SQLSTATE(57P01), ErrorCode(0)
org.postgresql.util.PSQLException: FATAL: terminating connection due to administrator command
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2676)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:356)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:496)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:413)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:190)
at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:152)
at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeUpdate(ProxyPreparedStatement.java:61)
at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeUpdate(HikariProxyPreparedStatement.java)
at clojure.java.jdbc$db_do_execute_prepared_return_keys$exec_and_return_keys__20429.invoke(jdbc.clj:979)
at clojure.java.jdbc$db_do_execute_prepared_return_keys.invokeStatic(jdbc.clj:1007)
at clojure.java.jdbc$db_do_execute_prepared_return_keys.invoke(jdbc.clj:963)
at clojure.java.jdbc$db_do_prepared_return_keys.invokeStatic(jdbc.clj:1038)
at clojure.java.jdbc$db_do_prepared_return_keys.invoke(jdbc.clj:1015)
at puppetlabs.puppetdb.jdbc$multi_insert_helper$fn__21900.invoke(jdbc.clj:161)
at clojure.core$map$fn__5935.invoke(core.clj:2772)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:51)
at clojure.lang.Cons.next(Cons.java:39)
at clojure.lang.RT.next(RT.java:713)
at clojure.core$next__5451.invokeStatic(core.clj:64)
at clojure.core$dorun.invokeStatic(core.clj:3143)
at clojure.core$doall.invokeStatic(core.clj:3149)
at clojure.core$doall.invoke(core.clj:3149)
at puppetlabs.puppetdb.jdbc$multi_insert_helper.invokeStatic(jdbc.clj:161)
at puppetlabs.puppetdb.jdbc$multi_insert_helper.invoke(jdbc.clj:155)
at puppetlabs.puppetdb.jdbc$insert_helper$fn__21903.invoke(jdbc.clj:170)
at clojure.java.jdbc$db_transaction_STAR_.invokeStatic(jdbc.clj:860)
at clojure.java.jdbc$db_transaction_STAR_.invoke(jdbc.clj:776)
at clojure.java.jdbc$db_transaction_STAR_.invokeStatic(jdbc.clj:789)
at clojure.java.jdbc$db_transaction_STAR_.invoke(jdbc.clj:776)
at puppetlabs.puppetdb.jdbc$insert_helper.invokeStatic(jdbc.clj:170)
at puppetlabs.puppetdb.jdbc$insert_helper.invoke(jdbc.clj:164)
at puppetlabs.puppetdb.jdbc$insert_rows_BANG_.invokeStatic(jdbc.clj:189)
at puppetlabs.puppetdb.jdbc$insert_rows_BANG_.invoke(jdbc.clj:173)
at puppetlabs.puppetdb.jdbc$insert_multi_BANG_.invokeStatic(jdbc.clj:207)
at puppetlabs.puppetdb.jdbc$insert_multi_BANG_.invoke(jdbc.clj:197)
at puppetlabs.puppetdb.scf.storage$realize_paths$fn__27738.invoke(storage.clj:1216)
at clojure.core$map$fn__5935.invoke(core.clj:2772)
at clojure.lang.LazySeq.sval(LazySeq.java:42)
at clojure.lang.LazySeq.seq(LazySeq.java:51)
at clojure.lang.RT.seq(RT.java:535)
at clojure.core$seq__5467.invokeStatic(core.clj:139)
at clojure.core$dorun.invokeStatic(core.clj:3134)
at clojure.core$dorun.invoke(core.clj:3134)
at puppetlabs.puppetdb.scf.storage$realize_paths.invokeStatic(storage.clj:1212)
at puppetlabs.puppetdb.scf.storage$realize_paths.invoke(storage.clj:1203)
at puppetlabs.puppetdb.scf.storage$realize_paths.invokeStatic(storage.clj:1207)
at puppetlabs.puppetdb.scf.storage$realize_paths.invoke(storage.clj:1203)
at puppetlabs.puppetdb.scf.storage$fn__27782$update_facts_BANG___27787$fn__27791$fn__27793.invoke(storage.clj:1310)
at clojure.java.jdbc$db_transaction_STAR_.invokeStatic(jdbc.clj:860)
at clojure.java.jdbc$db_transaction_STAR_.invoke(jdbc.clj:776)
at puppetlabs.puppetdb.scf.storage$fn__27782$update_facts_BANG___27787$fn__27791.invoke(storage.clj:1266)
at puppetlabs.puppetdb.scf.storage$fn__27782$update_facts_BANG___27787.invoke(storage.clj:1260)
at puppetlabs.puppetdb.scf.storage$fn__28110$replace_facts_BANG___28115$fn__28119$fn__28121.invoke(storage.clj:1965)
at puppetlabs.puppetdb.scf.storage.proxy$java.lang.Object$Callable$7da976d4.call(Unknown Source)
at com.codahale.metrics.Timer.time(Timer.java:101)
at puppetlabs.puppetdb.scf.storage$fn__28110$replace_facts_BANG___28115$fn__28119.invoke(storage.clj:1962)
at puppetlabs.puppetdb.scf.storage$fn__28110$replace_facts_BANG___28115.invoke(storage.clj:1955)
at puppetlabs.puppetdb.command$do_replace_facts.invokeStatic(command.clj:401)
at puppetlabs.puppetdb.command$do_replace_facts.invoke(command.clj:399)
at puppetlabs.puppetdb.command$exec_replace_facts$fn__32666.invoke(command.clj:409)
at puppetlabs.puppetdb.jdbc$retry_with_monitored_connection$fn__22104$fn__22117.invoke(jdbc.clj:553)
at clojure.java.jdbc$db_transaction_STAR_.invokeStatic(jdbc.clj:807)
at clojure.java.jdbc$db_transaction_STAR_.invoke(jdbc.clj:776)
at puppetlabs.puppetdb.jdbc$retry_with_monitored_connection$fn__22104.invoke(jdbc.clj:549)
at puppetlabs.puppetdb.jdbc$retry_sql$attempt__22084.invoke(jdbc.clj:497)
at puppetlabs.puppetdb.jdbc$retry_sql.invokeStatic(jdbc.clj:507)
at puppetlabs.puppetdb.jdbc$retry_sql.invoke(jdbc.clj:488)
at puppetlabs.puppetdb.jdbc$retry_with_monitored_connection.invokeStatic(jdbc.clj:546)
at puppetlabs.puppetdb.jdbc$retry_with_monitored_connection.invoke(jdbc.clj:541)
at puppetlabs.puppetdb.command$exec_replace_facts.invokeStatic(command.clj:406)
at puppetlabs.puppetdb.command$exec_replace_facts.invoke(command.clj:403)
at puppetlabs.puppetdb.command$exec_command.invokeStatic(command.clj:525)
at puppetlabs.puppetdb.command$exec_command.invoke(command.clj:510)
at puppetlabs.puppetdb.command$attempt_exec_command$fn__32843.invoke(command.clj:611)
at puppetlabs.puppetdb.command$call_with_quick_retry$fn__32832.invoke(command.clj:591)
at puppetlabs.puppetdb.command$call_with_quick_retry.invokeStatic(command.clj:589)
at puppetlabs.puppetdb.command$call_with_quick_retry.invoke(command.clj:587)
at puppetlabs.puppetdb.command$attempt_exec_command.invokeStatic(command.clj:609)
at puppetlabs.puppetdb.command$attempt_exec_command.invoke(command.clj:605)
at puppetlabs.puppetdb.command$process_cmd$fn__32899$fn__32900.invoke(command.clj:732)
at puppetlabs.puppetdb.command$process_cmd$fn__32899.invoke(command.clj:729)
at puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26316$fn__26317$fn__26318.invoke(metrics.clj:23)
at puppetlabs.puppetdb.utils.metrics.proxy$java.lang.Object$Callable$7da976d4.call(Unknown Source)
at com.codahale.metrics.Timer.time(Timer.java:101)
at puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26316$fn__26317.invoke(metrics.clj:23)
at puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26316$fn__26317$fn__26318.invoke(metrics.clj:23)
at puppetlabs.puppetdb.utils.metrics.proxy$java.lang.Object$Callable$7da976d4.call(Unknown Source)
at com.codahale.metrics.Timer.time(Timer.java:101)
at puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_$fn__26316$fn__26317.invoke(metrics.clj:23)
at puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_.invokeStatic(metrics.clj:26)
at puppetlabs.puppetdb.utils.metrics$multitime_BANG__STAR_.invoke(metrics.clj:15)
at puppetlabs.puppetdb.command$process_cmd.invokeStatic(command.clj:726)
at puppetlabs.puppetdb.command$process_cmd.invoke(command.clj:712)
at puppetlabs.puppetdb.command$process_message.invokeStatic(command.clj:819)
at puppetlabs.puppetdb.command$process_message.invoke(command.clj:775)
at puppetlabs.puppetdb.command$message_handler$fn__32933.invoke(command.clj:853)
at puppetlabs.puppetdb.threadpool$dochan$fn__32514$fn__32515.invoke(threadpool.clj:115)
at puppetlabs.puppetdb.threadpool$gated_execute$fn__32475.invoke(threadpool.clj:68)
at clojure.lang.AFn.run(AFn.java:22)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
We also have an issue about if a lot of resources are uploaded to puppetdb (discussion on Slack), but currently I solved it by reducing frequeintly of GC (once a day instead once a hour) and switched puppet agent to puppetserver in Luxemburg (apply catalog has been changed from 90 seconds to 110 but it solves the issue with uploading catalog into puppetdb).
I can provide more logs from postgres or puppetdb in this case. In DB we saw some Locks
[root@ed-sl-b166 pg_log]# rg 'ShareLock|ExclusiveLock' postgresql-2024-05-10_000000.json -c
2536
[root@ed-sl-b166 pg_log]# rg 'ShareLock|ExclusiveLock' postgresql-2024-05-09_000000.json -c
1607
Ah sorry, forgot about that log spew from HikariCP. Those are expected warnings from our connection pool because to terminate the query consistently we have to close the entire connection.
The error that would indicate a command failed would be
Exceeded max attempts ({2}) for {3} {4}
I would need to double check our retry semantics to be 100% sure, but I think it would also be problematic if you saw
Retrying after attempt {2} for {3}, due to: {4} {5}
because that would mean our "fast retry" failed as well.
Ah, so if I have a WARN
- it will be retried even in logs I see FATAL
. So main is to have ERROR
and in this case it will be.
So, if I turn back my hosts with a big amount of resources back to puppetserver and puppetdb with high RTT - I can reproduce the issue, because now I didn't see errors, but in previous days it was like https://gist.github.com/sharewax/171b16848dedf65f30b64862634f2fdc
like host with big amount of resouces in catalog can't update during time and GC kills every hour this query.
Describe the Bug
I observe in logs of puppetdb that GC kills some connections to postgres. We have main node located in EU with settings (currently it runs GC every day, was 1 hour and for reproduce more frequient I can set it to less value):
and more nodes puppetdb which the most far located in Singapore with RTT 183 ms with settings to the same database in EU
So the issue - I observe drops of connection to postgres by GC even without some respectful waiting during phase
Starting sweep of stale reports and resource events
:So it kills 3 connection in time less than half of second (300 ms according log). this connections are actively sent something from remote location (grep by pid is below):
Expected Behavior
I suppose that GC should wait for something and drops queries only of something really terrible happens. Maybe be more patient or expand in logs why this connections to postgres should be terminated during GC. As for my side GC shouldn't be destructive.
Steps to Reproduce
I can reproduce in with every run GC with 2 puppetdb which are located quite far away (for example add by
tc
some delay between nodes)Environment