Open frost242 opened 3 hours ago
Hi Thomas,
Thanks for the report!
I think this is coming from the new per-db snapshot feature. Can you try a custom patch? Maybe this would be enough to fix the problem:
diff --git a/powa_collector/snapshot.py b/powa_collector/snapshot.py
index d7eb5e2..c0a1b82 100644
--- a/powa_collector/snapshot.py
+++ b/powa_collector/snapshot.py
@@ -113,6 +113,7 @@ def copy_remote_data_to_repo(cls, data_name,
buf = StringIO()
try:
data_src.copy_expert("COPY (%s) TO stdout" % data_src_sql, buf)
+ data_src.execute("RELEASE src")
except psycopg2.Error as e:
src_ok = False
err = "Error retrieving datasource data %s:\n%s" % (data_name, e)
@@ -125,6 +126,7 @@ def copy_remote_data_to_repo(cls, data_name,
try:
cls.logger.debug("Calling %s..." % cleanup_sql)
data_src.execute(cleanup_sql)
+ data_src.execute("RELEASE src")
except psycopg2.Error as e:
err = "Error while calling %s:\n%s" % (cleanup_sql, e)
errors.append(err)
@@ -142,6 +144,7 @@ def copy_remote_data_to_repo(cls, data_name,
try:
# For data import the schema is now on the repository server
data_ins.copy_expert("COPY %s FROM stdin" % target_tbl_name, buf)
+ data_ins.execute("RELEASE data")
except psycopg2.Error as e:
err = "Error while inserting data:\n%s" % e
cls.logger.warning(err)
I'm trying it now, I'll let you know ASAP.
Things are probably better, now I experiment memory errors. I'll let you know on monday I think.
ok, that's a least a first good news I hope!
Also, now many sessions are experiencing LWLock waits for BufferMapping, as well as relation extension locks. I have to check this issue first.
this one looks just a side effect of having that many servers on a fresh install. once you hit the coalesce and purge, auto vacuum should kick in and you shouldn't hit the relation extension anymore. for the buffer mapping, there is probably not much to do apart from lowering the frequency or excluding some databases from the per db snapshots. do you usually have one db per instance or a lot?
Thanks for the feedback. I have usually one db per instance. So I'll try to solve this memory issue first.
Hello,
Just a quick feedback when using powa-collector 1.3.0 with PoWA 5.0. We have 338 PostgreSQL instances monitored by a single PoWA instance. While running, the collector generates a huge load on our server. Loadavg goes up to 50. I notice a lot of sessions in waiting state with wait_event_type LWLock and wait_event SubtransSLRU.
Just a few minutes after upgrading our PoWA db to PostgreSQL 17, we have a huge activity on the subtransaction SLRU :
I'm not sure that playing with subtransaction_buffers will help a lot here, as I only have hits.
Things goes better after having set subtransaction_buffers to '1GB', SLRU blks_hit appears less often. However, load goes up even more, I have to investigate further this issue - probably a mix of infrastructure issue and something else. I'll let you know. I thought it's a good idea to share this with you.