powa-team / powa-collector

PoWA Collector daemon for remote snapshot
http://powa.readthedocs.io/
PostgreSQL License
7 stars 4 forks source link

powa-collector throws an error when connecting to remote #3

Closed frost242 closed 5 years ago

frost242 commented 5 years ago

Hello,

I'm trying out the remote mode in our environment. Local setup is OK with all extensions except pg_wait_sampling.

With no remote servers configured, the collector runs fine. Then I registered a new server with powa_register_server() and run powa-collector again. I've got the following error (in debug mode) :

[postgres@powa powa-collector]$ ./powa-collector.py
2019-06-27 13:06:00,179 - INFO  : Starting powa-collector...
2019-06-27 13:06:00,179 - DEBUG : Connecting on repository...
2019-06-27 13:06:00,193 - DEBUG : Connected.
2019-06-27 13:06:00,198 somehost:1234 DEBUG : Creating worker somehost:1234: {'dsn': {'host': 'somehost', 'password': 'pouet', 'port': 1234, 'dbname': 'postgres', 'user': 'username'}, 'frequency': 300, 'srvid': 1}
2019-06-27 13:06:00,198 somehost:1234 INFO  : Starting worker
2019-06-27 13:06:00,198 - INFO  : List of workers:
2019-06-27 13:06:00,199 somehost:1234 DEBUG : Connecting on repository...
2019-06-27 13:06:00,199 - INFO  : somehost:1234: {'host': 'somehost', 'password': 'pouet', 'port': 1234, 'dbname': 'postgres', 'user': 'username'} (no connection to remote server)
2019-06-27 13:06:00,210 somehost:1234 DEBUG : Connected.
2019-06-27 13:06:00,212 somehost:1234 DEBUG : Connecting on remote database...
2019-06-27 13:06:00,244 somehost:1234 DEBUG : Connected.
2019-06-27 13:06:00,247 somehost:1234 ERROR : Error connecting on{'host': 'somehost', 'password': 'pouet', 'port': 1234, 'dbname': 'postgres', 'user': 'username'}:
could not access file "powa": No such file or directory

Exception in thread somehost:1234:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/threading.py", line 812, in __bootstrap_inner
    self.run()
  File "/usr/local/powa-collector/powa_collector/powa_worker.py", line 402, in run
    self.__worker_main()
  File "/usr/local/powa-collector/powa_collector/powa_worker.py", line 193, in __worker_main
    self.__check_powa()
  File "/usr/local/powa-collector/powa_collector/powa_worker.py", line 75, in __check_powa
    WHERE extname = 'powa'""")
InFailedSqlTransaction: current transaction is aborted, commands ignored until end of transaction block

^C2019-06-27 13:06:05,481 - DEBUG : KeyboardInterrupt caught
2019-06-27 13:06:05,481 - INFO  : Stopping all workers and exiting...
2019-06-27 13:06:05,481 somehost:1234 INFO  : Asked to stop...

a local psql connection using the connection information displayed by powa-collector is OK. I just don't get why I have a "could not access file "powa": No such file or directory" error.

Machine is an Or...le Linux 7 (RHEL7 based distro) with Python 2.7.5 and psycopg2 2.8.2. Let me know if you need further informations but unfortunately I can't connect to Irc right now.

Thanks a lot

frost242 commented 5 years ago

Btw, the query fails because powa is not installing on the remote. I just didn't except such error message.

rjuju commented 5 years ago

This is probably due to

                # make sure the GUC are present in case powa isn't in
                # shared_preload_librairies
                cur.execute("LOAD 'powa'")

Is powa-archivist properly installed on the target host?

rjuju commented 5 years ago

Yes, unfortunately what postgres returns when "LOAD powa" fails is not especially useful if you don't have the query context :/

I'll check if it's possible to get the full error context also when trapping the exception.

frost242 commented 5 years ago

Maybe you can call check_powa before issuing the LOAD command ? The powa extension should be installed to retrieve the remote datas, so it may be a good way to check if everything is OK before throwing LOAD.

rjuju commented 5 years ago

check_powa() calls __connect() beforehand. Also, if we're to put training wheels for that, I'm not sure that testing that the extension is created is good enough, as it does't really ensure that the lib is actually present and loadable. Probably just hinting something like "is powa extension properly setup?" would be enough.

Or to continue on your idea, call LOAD powa in check_powa would do the trick.

frost242 commented 5 years ago

Maybe checking existence of powa before calling LOAD will avoid some errors in the log file. Anyway, +1 for the hint.

I'm also thinking how we can avoid the LOAD command, as the production databases are running on an exotic OS (and will continue for the 4-6 months to come). Maybe I should simply fork powa-collector and powa-archivist and modify them to not issue the LOAD 'powa' command and ensure that all GUCs are set in a different manner.

rjuju commented 5 years ago

I'm curious about your use case. How are you planning do make it work without the extension? Your plan is to manually create all SQL objects on the target database, and enjoy reading the diffs on every version bump?

frost242 commented 5 years ago

It's just simply difficult to compile anything on the server - also it's a big endian system and I'm not sure that everything runs correctly (qualtstats, kcache). So my plan was to modify powa--4.0.0.sql to remove the LOAD 'powa' call. Then, ensure that all GUCs are declared in the postgresql.conf. But I know that I'll get mad at maintaining this :)

rjuju commented 5 years ago

Yes, probably :)

It's sad that current_settings() allowing missing GUC is only available on pg9.6+, otherwise we could have a simpler workaround.

rjuju commented 5 years ago

Can you check is this is frendlier with commit https://github.com/powa-team/powa-collector/commit/fff28fb8911b384cb19e0f590fd7b241101b322a on the fix_3 branch?

frost242 commented 5 years ago

The error message is way better now, but I still have an exception as traced here: 2019-06-28 15:04:39,550 somehost:1234 INFO : Starting worker 2019-06-28 15:04:39,550 somehost:1234 DEBUG : Connecting on repository... 2019-06-28 15:04:39,562 somehost:1234 DEBUG : Connected. 2019-06-28 15:04:39,565 somehost:1234 DEBUG : Connecting on remote database... 2019-06-28 15:04:39,612 somehost:1234 DEBUG : Connected. 2019-06-28 15:04:39,631 somehost:1234 ERROR : PoWA extension not found 2019-06-28 15:04:39,631 somehost:1234 INFO : Disconnecting from remote server 2019-06-28 15:04:39,631 somehost:1234 INFO : Disconnecting from repository Exception in thread somehost:1234: Traceback (most recent call last): File "/usr/lib64/python2.7/threading.py", line 812, in bootstrap_inner self.run() File "/usr/local/powa-collector/powa_collector/powa_worker.py", line 411, in run self.__worker_main() File "/usr/local/powa-collector/powa_collector/powa_worker.py", line 202, in worker_main self.check_powa() File "/usr/local/powa-collector/powa_collector/powa_worker.py", line 92, in __check_powa cur = self.remote_conn.cursor() AttributeError: 'NoneType' object has no attribute 'cursor'

frost242 commented 5 years ago

It seems that the try/except block is executed by check_powa, even if the prechecks went bad.

frost242 commented 5 years ago

This seems to make the job (sorry, I can't create a PR) :

diff --git a/powa_collector/powa_worker.py b/powa_collector/powa_worker.py
index 9ed8d43..4d70192 100644
--- a/powa_collector/powa_worker.py
+++ b/powa_collector/powa_worker.py
@@ -80,11 +80,13 @@ class PowaThread (threading.Thread):
                 self.logger.error("PoWA extension not found")
                 self.__disconnect_all()
                 self.__stopping.set()
+                return
             elif (res[0] < 4):
                 self.logger.error("Incompatible PoWA version, found %s,"
                                   " requires at least 4.0.0" % res[1])
                 self.__disconnect_all()
                 self.__stopping.set()
+                return

             try:
                 # make sure the GUC are present in case powa isn't in
rjuju commented 5 years ago

indeed, that was a silly mistake! I just pushed an updated patch on the branch. If that's ok for you I'll merge on master.

frost242 commented 5 years ago

It's ok for me !

rjuju commented 5 years ago

Thanks, pushed!