powa-team / powa-web

PoWA user interface
http://powa.readthedocs.io/
73 stars 31 forks source link

Slow to load database view when query lengths are substantional #160

Closed Mayzie closed 1 month ago

Mayzie commented 1 year ago

Hi there!

When trying to browse into a database and at a time when there has been a large query (length of a query string like 7,000,000 characters), basically the entire view takes ages to load (and load it does after some length of time) with the view just being a loading spinner. When it loads, hovering over the long queries causes it to try to display the entire query.

I don't really know how else to expand on this. Basically, long query string length = very slow and unusable PoWA web.

I think the best course of action in such a case is to truncate the query before sending it to the web application (substring(pg_stat_statements.query, ...)), and to return the length of the query as well (octet_length(pg_stat_statements.query) and/or length(pg_stat_statements.query)), and display as such in the user interface.

Yeah I know I shouldn't have such long queries, it was a mistake on my part. But it revealed a flaw. 🙂

rjuju commented 1 year ago

Hi,

Thanks a lot for the report! That's certainly not something we've beeing testing, so I can imagine that it can lead to unexpected problems in various places of the code.

I will try to reproduce it locally, but in the mean time one thing you could do is open the "developer tools" on your browser (usually available with F12), go to the network tab and refresh one of the slow page. You should then be able to see all calls from the browser to the powa-web app, and see if (and how much) the slowness comes from the backend part.

Mayzie commented 1 year ago

Thanks rjuju!

It seems to stall on this endpoint: metrics/database_all_queries_waits/<dbname>.

rjuju commented 1 year ago

Thanks! Well, that's very surprising as this endpoint shouldn't access the query text at all.

Which version of powa-web are you using? I know there were some inefficiencies in some of the SQL statements in previous version, but the last one is https://github.com/powa-team/powa-web/commit/d3e6137f91d21176f406922bf3bc5c76c13c9f4c, which was fixed in version 4.1.0.

Mayzie commented 1 year ago

I'm running 4.1.3-1.pgdg22.04+1 (via pgdg repositories on Ubuntu 22.04). Maybe I'm wrong and it's not the query length that's setting it off? But querying pg_stat_statements directly shows an UPDATE query of size ~7 million bytes (affecting about 100,000 rows and total_exec_time of ~6,000,000). There is also a SELECT of size ~4 million bytes (affecting 400,000 rows and having a total_exec_time of ~500,000). 🤷‍♀️

As of writing this reply, metrics/database_all_queries_waits/<dbname> is still stalled and waiting for a response.

rjuju commented 1 year ago

Such a massive statement is more likely to cause slowdown when processing it in postgres than in powa I think (if it's an OLTP-like query). But in absolute it's not that big, so it shouldn't cause 10+ minutes to retrieve it from the backend.

As of writing this reply, metrics/database_all_queries_waits/ is still stalled and waiting for a response.

Wow, that's really unexpected. Do you see anything in the postgres logs of the powa repository server? Are you using local mode or remote mode?

One thing you could try to check if there's something wrong in pg_wait_sampling code/data or if it's really the query text is temporarily disable pg_wait_sampling extension for that server (either local server or remote server). Once done you can refresh the page and it won't execute that endpoint. If everything is fast, we will know where to focus.

I can give you the exact queries to execute for that once you give me more details about your setup.

Mayzie commented 1 year ago

It's a local database.

I disabled pg_wait_sampling (by removing it from shared_preload_libraries and restarting Postgres), but powa-web still stalls on that endpoint, although this time I get a nice big red banner at the top stating:

<local>: powa_take_snapshot(0): function "powa_wait_sampling_snapshot) failed: pg_wait_sampling shared memory wasn't initialized yet

At this time the web application is still frozen and I am unable to interact with any part of it, just like in the initial issue.

EDIT: Sorry, it's a bit late here. I'm going to have to revisit this issue tomorrow 🙂

rjuju commented 1 year ago

Yes, that's because powa configuration still says that you have the extension enabled. You can re-enable it and restart postgres. Disabling an extension for powa should can be done dynamicalle, it's just (for local server):

UPDATE powa_functions SET enabled = false WHERE module = 'pg_wait_sampling' AND srvid = 0;

and then you can try to refresh the UI. And then if you want:

UPDATE powa_functions SET enabled = true WHERE module = 'pg_wait_sampling' AND srvid = 0;

to re-enable it again, so that you can keep aggregating data, and try to see them in the UI.

EDIT: Sorry, it's a bit late here. I'm going to have to revisit this issue tomorrow 🙂

No worries, I'm in GMT +8 so it's also quite late here :)

Mayzie commented 1 year ago

UPDATE powa_functions SET enabled = false WHERE module = 'pg_wait_sampling' AND srvid = 0;

Ran the above.

Once done you can refresh the page and it won't execute that endpoint. If everything is fast, we will know where to focus.

Yup, everything is fast again! 🙂 Although, it still executes the endpoint (returns a JSON array of 3 elements, each being a ClientWrite and shows them under the "Wait events for all queries" section of the interface) - unsure if this is still expected?

rjuju commented 1 year ago

Ah right, for local servers we don't use the powa catalog to filter things in the UI, as we have a cheap and direct access to the real information. Instead it's necessary to drop the extension on the powa database. This will call an event trigger which will do the necessary cleanup in the powa_functions table. Creating the extension again will call another event trigger which will reenable the extension in powa_functions table.

So, what this means is that there wasn't a lot of activity on that database in the last hour, which is why the endpoint is fast, and doesn't return a lot of data. If you change the time interval to about 16 hours ago, it should be slow again.

If that's the case, and assuming you have more than 24h data retention, we can try to diagnose the slowdown. Is there any chance you can do some partial dump of the powa tables? I would only need powa_statements, powa_databases, powa_wait_sampling_history and powa_wait_sampling_history_current I think. On those, the only one that can contain sensitive information is powa_databases (if you care about your database names), and powa_statements obviously. I don't need those information anyway, so you could restore those data in a new database, replace the content with any random data and dump it again and it should still be enough to reproduce the problem locally.

If that's too problematic, I can help you to do the analysis on your database by providing various statements to find the source of the problem, but it will require quite a lot of back and forth.