powa-team / powa-web

PoWA user interface
http://powa.readthedocs.io/
73 stars 31 forks source link

After upgrading Repository database from 11.7 to 14.5 version, all the remote instance entries are gone from powa_servers view and powa-web does not display any instances #178

Closed hrawulwa closed 11 months ago

hrawulwa commented 11 months ago

After the repository database got upgraded, the remote instance entries are gone. powa-web also does not display any instances. I see the below message on the top of screen.

:powa_take_snapshot(0): function "powa_wait_sampling_snapshot" failed: pg_wait_sampling shared memory wasn't initialized yet. I also tried to register one remote instance. However snapshot doesn't happen. During the registration using powa_register_server function, even though the remote instance got registered, I got below warning message: WARNING: Could not activate extension hypopg on server sl...001:5432 I verified hypopg extension is available in the remote instance. What has gone wrong after the upgrade? How do I start registering and monitor the remote instances again? Please advise. Thanks Hari
banlex73 commented 11 months ago

Hey Hari, similar error was described here https://github.com/postgrespro/pg_wait_sampling/issues/1 I suspect shared_preload_libraries doesn't have pg_wait_sampling or PG cluster wasn't restarted

чт, 20 лип. 2023 р. о 15:40 hrawulwa @.***> пише:

After the repository database got upgraded, the remote instance entries are gone. powa-web also does not display any instances. I see the below message on the top of screen. :powa_take_snapshot(0): function "powa_wait_sampling_snapshot" failed: pg_wait_sampling shared memory wasn't initialized yet.

I also tried to register one remote instance. However snapshot doesn't happen. During the registration using powa_register_server function, even though the remote instance got registered, I got below warning message: WARNING: Could not activate extension hypopg on server sl...001:5432 I verified hypopg extension is available in the remote instance.

What has gone wrong after the upgrade? How do I start registering and monitor the remote instances again? Please advise.

Thanks Hari

— Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/178, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYBZ7TKDV3H3YXYMOYDXRGXVHANCNFSM6AAAAAA2SB6EHU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

hrawulwa commented 11 months ago

Hi, Note that this is remote setup. Since I'm not monitoring repository server and only monitoring remote instances, shared_preload_libraries need not contain any extensions. All it needs is powa archivist extension. All the stats extensions are defined in the remote instances. It used to work before with this setup. I'm not sure if I need to resintall powa-archivist in the repository server to fix this issue?

Thanks Hari

banlex73 commented 11 months ago

I might be wrong but powa_take_snapshot(0) means that it is trying to take a snapshot for localhost, usually it doesn't try to do it (powa_servers.frequency column has -1 for the localhost) and I suspect that for localhost you setup pg_wait_sampling Please try: select * from powa_extensions ; In my case, it shaws all the extensions I setup srvid | extname | version -------+--------------------+--------- 0 | pg_stat_statements | 0 | powa | 0 | pg_stat_kcache | 0 | pg_qualstats | 0 | pg_track_settings | 0 | pg_wait_sampling |

чт, 20 лип. 2023 р. о 17:14 hrawulwa @.***> пише:

Hi, Note that this is remote setup. Since I'm not monitoring repository server and only monitoring remote instances, shared_preload_libraries need not contain any extensions. All it needs is powa archivist extension. All the stats extensions are defined in the remote instances. It used to work before with this setup. I'm not sure if I need to resintall powa-archivist in the repository server to fix this issue?

Thanks Hari

— Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/178#issuecomment-1644810340, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYH3DK2QRBTIGDXSNKTXRHCWFANCNFSM6AAAAAA2SB6EHU . You are receiving this because you commented.Message ID: @.***>

hrawulwa commented 11 months ago

Yes, I get the exact same output as yours. I noticed one difference in shared_preload_libraries before and after upgrade on the repository server.

shared_preload_libraries = 'pg_stat_statements,auto_explain,pgstattuple,pgfincore,pg_buffercache,btree_gist,powa'

shared_preload_libraries = 'pg_stat_statements,auto_explain,pgstattuple,pgfincore,pg_buffercache,btree_gist,powa,pg_stat_kcache'

I see one additional entry pg_stat_kcache, but I suspect that might not be a problem. Does dropping and recreating pg_wait_sampling extension help in this case?

Thanks Hari

rjuju commented 11 months ago

Yes, powa_take_snapshot(0) means that this is a snapshot for the local instance. If you're using the remote mode (and thus using powa-collector), powa_take_snapshot(0) shouldn't be called unless you have powa in the shared_preload_libraries of the repository server (which is the only reason why you would need to have this configuration). In that case it will try to do regular snapshot using the extensions found. I'm assuming that you "inherited" the various extensions from the pg11 server but they're not really configured now.

Anyway, just remove powa from shared_preload_library and that specific problem will go away.

banlex73 commented 11 months ago

What I would do:

  1. update powa_servers set frequency=-1 where id=0; -- to disable snapshots for localhost
  2. select powa_deactivate_extension(0,'pg_wait_sampling');

Should be good PS: pg_stat_kcache don't think it is related

чт, 20 лип. 2023 р. о 17:38 hrawulwa @.***> пише:

Yes, I get the exact same output as yours. I noticed one difference in shared_preload_libraries before and after upgrade on the repository server.

shared_preload_libraries =

'pg_stat_statements,auto_explain,pgstattuple,pgfincore,pg_buffercache,btree_gist,powa' shared_preload_libraries = 'pg_stat_statements,auto_explain,pgstattuple,pgfincore,pg_buffercache,btree_gist,powa,pg_stat_kcache'

I see one additional entry pg_stat_kcache, but I suspect that might not be a problem. Does dropping and recreating pg_wait_sampling extension help in this case?

Thanks Hari

— Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/178#issuecomment-1644825756, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYHL6GMJZLEHAU4SCC3XRHFPPANCNFSM6AAAAAA2SB6EHU . You are receiving this because you commented.Message ID: @.***>

hrawulwa commented 11 months ago

After deactivating pg_wait_sampling extension for local instance, the error went away from the UI. Also, it started collecting snapshots for one remote instance which I added. But the question remains, how come all the remote instances entries (around 15 of them) got deleted from powa_servers after the DB upgrade? Is this normal and do we need to re-register the instance after every upgrade?

Thanks Hari

rjuju commented 11 months ago

It's not expected to lose either the remote server definitions nor their metrics.

How did you upgrade the repository server?

hrawulwa commented 11 months ago

The upgrade was done using pg_upgrade method. I will have to get more details from the team if there were any issues during the upgrade. I will let you know once I know something.

Thanks Hari

banlex73 commented 11 months ago

my understanding of how pg_upgrade works, it should not change anything... but to dig it deeper, you better check all upgrade logs. BTW, happy to hear that deactivating pg_wait_sampling extension helped.

чт, 20 лип. 2023 р. о 18:26 hrawulwa @.***> пише:

The upgrade was done using pg_upgrade method. I will have to get more details from the team if there were any issues during the upgrade. I will let you know once I know something.

Thanks Hari

— Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/178#issuecomment-1644863164, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYEDOKKOUK7A2UDOTVLXRHLE3ANCNFSM6AAAAAA2SB6EHU . You are receiving this because you commented.Message ID: @.***>

rjuju commented 11 months ago

@banlex73 indeed

@hrawulwa there's a known postgres bug during pg_upgrade if you have extensions in shared_preload_librairies that can do db writes, like powa does (see https://github.com/powa-team/powa/issues/176 or https://powa.readthedocs.io/en/latest/components/powa-archivist/installation.html#major-postgresql-upgrade). Maybe that would explain your issue?

Could you share the output of

SELECT * FROM powa_servers;

Also, is installing pageinspect on the powa database of the repository server an option?

hrawulwa commented 11 months ago

Ok, shared_preload_librairies did have powa extension. Let me explain how upgrade process happened:

  1. Install 14.5 software in a new HOME
  2. Drop extensions from the current 11.7 version
  3. Configure and initialize new 14.5 version
  4. Stop Postgres on 11.7 home
  5. Perform upgrade using pg_upgrade
  6. Install and compile extensions on the new upgraded 14.5 home
  7. Recreate the extensions on the upgraded database

You can see that all the extensions including powa were dropped on current version, and recreated on the upgraded database. Does dropping and recreating powa extension delete all the metrics data? Or do you think, data is lost due to bug during pg_upgrade where background writer like powa which caused corruption?

Regarding the output of powa_servers you requested, there is only one entry for local server. As per previous suggestion, deactivating pg_wait_sampling for local server fixed the error I was seeing in the UI.

Thanks Hari

banlex73 commented 11 months ago

I'm pretty sure, once you dropped powa extension all the metric data, everything was dropped.

On Thu, 20 Jul 2023, 22:20 hrawulwa, @.***> wrote:

Ok, shared_preload_librairies did have powa extension. Let me explain how upgrade process happened:

  1. Install 14.5 software in a new HOME
  2. Drop extensions from the current 11.7 version
  3. Configure and initialize new 14.5 version
  4. Stop Postgres on 11.7 home
  5. Perform upgrade using pg_upgrade
  6. Install and compile extensions on the new upgraded 14.5 home
  7. Recreate the extensions on the upgraded database

You can see that all the extensions including powa were dropped on current version, and recreated on the upgraded database. Does dropping and recreating powa extension delete all the metrics data? Or do you think, data is lost due to bug during pg_upgrade where background writer like powa which caused corruption?

Regarding the output of powa_servers you requested, there is only one entry for local server. As per previous suggestion, deactivating pg_wait_sampling for local server fixed the error I was seeing in the UI.

Thanks Hari

— Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/178#issuecomment-1644996794, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYAXOBEU27WHG2IYT5TXRIGQPANCNFSM6AAAAAA2SB6EHU . You are receiving this because you were mentioned.Message ID: @.***>

rjuju commented 11 months ago

Does dropping and recreating powa extension delete all the metrics data?

oh, yes indeed :( If you drop the powa extension you delete all remote servers configuration, all the metrics and every related data. There shouldn't be any need to drop the extension before upgrading, just making sure that no snapshot will be done (either setting powa.frequency = -1 or removing powa from shared_preload_libraries).

Or do you think, data is lost due to bug during pg_upgrade where background writer like powa which caused corruption?

if the extension is dropped, not snapshot can happen therefore no writes will be performed by powa and the bug cannot happen.

hrawulwa commented 11 months ago

OK great. Thanks for confirmation. So, for future upgrades, what is the recommendation?

  1. Do not drop the powa extension
  2. Remove the powa extension from shared_preload_libraries before the upgrade
  3. Add the powa extension back to shared_preload_libraries after the upgrade
  4. Update the powa extension in the upgraded database.

Do you think the above steps look good?

Thanks Hari

rjuju commented 11 months ago

Yes, that sound about right.

But again, I'd like to point out that having powa in shared_preload_libraries is only needed if you want to take snapshots for the repository server. If this is a dedicated postgres server and you don't need powa metrics for it, there's no need for that.

hrawulwa commented 11 months ago

Yes, thanks for clarifying. I already removed powa from shared_preload_libraries in the repository server. I have got another related question. What will happen when the upgrade happens on the Remote servers which are registered to the repository server? The remote servers has all the Stats extensions including powa. As part of the upgrade, extensions will be dropped and recreated. I believe this will not cause an issue, as the data is never stored on the remote servers. It is stored on the dedicated repository server. Please confirm.

Thanks Hari

rjuju commented 11 months ago

On the remote servers the powa extension indeed doesn't store anything, it's only used to provide a consistent and backward compatible view of the various stat extensions. For instance if new columns are added to pg_stat_statements (like the planning counters a few versions ago) but the postgres versions (or extension version) doesn't have it, powa will take care of returning 0, NULL or whatever is sensible for any missing field. So whether the extensions are dropped and recreated on the remote server doesn't matter at all.

That being said, pg_upgrade doesn't drop and recreate the extensions. It uses a special "binary mode" for pg_dump which will look into extensions dependencies (tables, functions, data types and so on) and will recreate the extension content exactly the same way as it was on the source server rather than doing a simple "CREATE EXTENSION ... VERSION ...", so any stored data would be preserved (you could for instance activate local snapshot for the remote server, which wouldn't be that useful as you would have the same data on the repository, but it's still possible).

Similarly, a pg_dumpall | pg_restore approach would also restore all the data, as powa extensions properly configures how pg_dump should save any table during a logical backup.

hrawulwa commented 11 months ago

Thanks for clarifying on the remote servers question. Yes, I agree that pg_upgrade doesn't drop and recreate the extensions. I forgot to mention that we use custom script to perform upgrade. As part of the pre-check, the extensions are dropped, and then the script calls the actual pg_upgrade function to perform the upgrade. I will need to understand why the team has decided to drop the extensions prior to upgrade. We performed upgrades on many remote servers without issues, but ran into issue for this particular repository server, where the powa extension actually holds the data, unlike other extensions. Will be careful going forward.

Thanks again for the help.

Thanks Hari

rjuju commented 11 months ago

Is that a script specific for a powa server or any of your postgres servers?

A long time ago there was an issue in pg_qualstats that prevented usage of pg_upgrade, but this was fixed in 2017 (see https://github.com/powa-team/pg_qualstats/commit/3c85a657bc06d6d539c729ad6432fbec5d4c7e41). So if the script is for powa and has been written 6 years ago maybe this was the reason. In any case this was a bug that was fixed, you shouldn't need to drop any of the powa extension before running pg_upgrade.

hrawulwa commented 11 months ago

The script is a generic one developed for any Postgres server. The upgrades are going fine on other servers, but hit with this issue just on Powa server. Will keep in mind not to drop extensions on Powa server for future upgrades.

Thanks Hari

rjuju commented 11 months ago

I'm not sure why you would want to drop any extension at all when doing a pg_upgrade, and the risk of losing data exists with many extensions. If that was done because of some specific issue you should try to investigate on what the real problem was, and try to fix it or report a bug to either the extension or postgres.

In any case, this isn't related to powa anymore so I'm closing this issue.