powa-team / powa-web

PoWA user interface
http://powa.readthedocs.io/
73 stars 31 forks source link

Query overview shows no data in all tabs for some queries #141

Closed dreis2211 closed 2 years ago

dreis2211 commented 2 years ago

Hi,

for certain queries I see "No Data" in all tabs of the query overview. E.g. like the following

image

This includes but is not limited to:

SELECT public.powa_take_snapshot();
REFRESH MATERIALIZED VIEW abc;
# Etc.

For others the data is perfectly shown.

The only thing that I see are some warning logs:

powa-standalone-web | /usr/local/lib/python3.7/dist-packages/sqlalchemy/sql/compiler.py:362: SAWarning: SELECT statement has a cartesian product between FROM element(s) "anon_2", "anon_1" and FROM element "block_size".  Apply join condition(s) between each element to resolve.

powa-standalone-web | /usr/local/lib/python3.7/dist-packages/sqlalchemy/sql/compiler.py:362: SAWarning: SELECT statement has a cartesian product between FROM element(s) "block_size" and FROM element "powa_statements".  Apply join condition(s) between each element to resolve.

powa-standalone-web | /usr/local/lib/python3.7/dist-packages/sqlalchemy/sql/compiler.py:362: SAWarning: SELECT statement has a cartesian product between FROM element(s) "most_used", "most_executed" and FROM element "least_filtering".  Apply join condition(s) between each element to resolve.

Unfortunately, I do not see any errors either locally or on the servers.

Any help is appreciated. Cheers, Christoph

banlex73 commented 2 years ago

Any errors in Postgres log on repository cluster? I had similar issue when something failed on my repo cluster..don't remember exactly what failed..

On Tue, Jul 20, 2021, 18:28 Christoph Dreis @.***> wrote:

Hi,

for certain queries I see "No Data" in all tabs of the query overview. E.g. like the following

[image: image] https://user-images.githubusercontent.com/6304496/126349955-1cdf3d12-a0af-4816-8c8f-49add780491f.png

This includes but not limited to:

SELECT public.powa_take_snapshot(); REFRESH MATERIALIZED VIEW abc;

Etc.

For others the data is perfectly shown.

Unfortunately, I do not see any errors either locally or on the servers.

Any help is appreciated. Cheers, Christoph

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/141, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYG7IX27FOLHB74XSCLTYWI2RANCNFSM5AWBGN7A .

dreis2211 commented 2 years ago

No cluster setup, just a normal database. But anyhow - no errors there either.

rjuju commented 2 years ago

The only thing that I see are some warning logs:

Those cartesian products are wanted. I'll double check for the quals one just in case, but for block_size it's definitely wanted.

About your problem, can you confirm:

dreis2211 commented 2 years ago

On the actual instance we're using for production:

image

I have the feeling this affects statements that are not returning any rows.

SET application_name = '?'
REFRESH MATERIALIZED VIEW abc
rjuju commented 2 years ago

I see this only on the query level.

I don't understand. The screenshot you sent was at the database level, for the powa database. Do you have different problems on your production servers vs the one you showed on the screenshot? For the server you used for the initial report, was the grid empty or not?

Mostly graphs are affected, but if you give me an example of a grid view inside the query overview, I can check.

For query-level pages the only grid is for the wait events, which should be available since you use pg_wait_sampling extension.

I guess we're using remote mode (web instance is accessing data from remote servers)

Sorry if that was unclear. The remote vs local mode is about where the powa data are stored, in local (using a bgworker) or on a remote server (using powa_collector daemon). If needed here are some more details on what changed with the remote mode: https://powa.readthedocs.io/en/latest/remote_setup.html

I have the feeling this affects statements that are not returning any rows.

SET application_name = '?' REFRESH MATERIALIZED VIEW abc [...] pg_stat_kcache | 2.2.0 | public | Kernel statistics gathering

Mmm, I think that's the problem. This is because those are actually utility statements, not DML, and pg_stat_kcache don't track them (at least for now). And the UI code does an INNER JOIN between pg_stat_statements subset and pg_stat_kcache subset, not a LEFT OUTER JOIN, so you're guaranteed to never see anything in that case.

I'll work on a fix in the UI, but I think the long term best option is to also handle utility statements in pg_stat_kcache (@frost242 any opinion?). Can you test either a patch on powa-web or a custom branch? It could be nice to validate that anything I change actually fixes your problem.

rjuju commented 2 years ago

Oh, by the way I forgot to mention but the utility vs DML problem is totally unrelated to the problem you originally reported. But it might also be due to some issue when joining pg_stat_statements sampled rows with the pg_stat_kcache ones, so a UI patch might partially fix this issue too.

dreis2211 commented 2 years ago

@rjuju I'm confused by what you mean with query level or database level. For me the process is as follows: choose a server -> choose a database -> choose a query. And the data is missing when I look at the specific queries. They're properly listed in the database view of course, as otherwise I wouldn't be able to look for details on the query level.

image

As you can see, there is also nothing in the wait events for this one. As said, neither tab is showing something useful and only "No data"

The screenshot of the initial report was based on a powa-docker setup, so there might be other problems here why the particular statement didn't show something.

In the end I'm interested in having data for all statements - especially REFRESH MATERIALIZED VIEW. And since kcache is only providing additional data on top, I think a left join is at least better here as you mentioned.

rjuju commented 2 years ago

I'm confused by what you mean with query level or database level

Yes, you're right and I totally confused both pages. Sorry about that I'm a bit tired.

I'll work on a patch tomorrow then!

And since kcache is only providing additional data on top, I think a left join is at least better here as you mentioned.

For this page yes, but for the database-level page it also switch the hit-ratio graph from an "in/out of postgres cache" to "postgres cache/os cache/disk" graph, so a LEFT JOIN there can lead to misleading information as there wouldn't be any disk access reported while there could have been.

The screenshot of the initial report was based on a powa-docker setup, so there might be other problems here why the particular statement didn't show something.

Ah, it could also be the reason. I once saw a case where the initial snapshot (and only this one) didn't retrieve any row for pg_stat_kcache, leading to a similar problem (rows from both extension are sampled separately, so joining the Nth row from each side failed as they had different timestamp). If you find another occurence for a regular DML query please report it so I can investigate!

rjuju commented 2 years ago

I just pushed a new branch with this commit that should fix the graphs in all pages when pg_stat_kcache records are missing.

Can you confirm that is also fixes the problem for you?

dreis2211 commented 2 years ago

@rjuju I can't test it on the actual instance unfortunately, but I did a local test where I had COPY statements that previously showed no data either and they work with your patch.

rjuju commented 2 years ago

@dreis2211 Thanks a lot for testing! I just merge the patch on the master branch. I will release a new version later today!

rjuju commented 2 years ago

I totally forgot to do the release, sorry about that. I just released a new version now so closing this issue.