powa-team / powa-web

PoWA user interface
http://powa.readthedocs.io/
73 stars 31 forks source link

Find how we can push queryid precidate inside subquery #108

Closed anayrat closed 3 years ago

anayrat commented 4 years ago

https://github.com/powa-team/powa-web/blob/7a86b1c34124204c4a1d2d0fe2be93a2abc61df5/powa/sql/__init__.py#L299-L311

When you select a query (view Query Overview), I noticed WHERE %(filter)s (in predicates tab) is translated in :

WHERE datname = 'my_base'
                    AND coalesce_range && tstzrange('2020-07-01 16:26:00+0200', '2020-07-02 09:26:57+0200') AND s.queryid = ANY (ARRAY[7199764886139264636])

This lead to many rows removed by filter multiple times. If you push the queryid in subquery above, execution went from 13s to 300ms.

brainexe commented 4 years ago

I also analyzed the query on our databases and it's really sloooow -> the /explains/ endpoint has an avg response time of 30s

here is the explain analyze of a 32s execution of qual_constants() https://explain.depesz.com/s/EQfh

banlex73 commented 4 years ago

To fix performance issue, I had to partition many tables (I am on PG12)

чт, 2 лип. 2020 о 07:40 Matthias Dötsch notifications@github.com пише:

I also analyzes the query on our databases and it's really sloooow -> the /explains/ endpoint has a avg response time of 30s here is the explain: https://explain.depesz.com/s/EQfh

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/108#issuecomment-653046411, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYE43UYTWIWA3AGRNATRZSL7NANCNFSM4OOV3AVQ .

anayrat commented 4 years ago

To fix performance issue, I had to partition many tables (I am on PG12) чт, 2 лип. 2020 о 07:40 Matthias Dötsch

Which ones? Maybe you should try with master, we commited several performance fix.

anayrat commented 4 years ago

I also analyzed the query on our databases and it's really sloooow -> the /explains/ endpoint has an avg response time of 30s

here is the explain analyze of a 32s execution of qual_constants() https://explain.depesz.com/s/EQfh

Could you post your query source? I want to test push down queryid inside subquery.

brainexe commented 4 years ago

sure, this query got executed: https://pastebin.com/AWr9bUvc

anayrat commented 4 years ago

Can you try explain (analyze) with this query? https://pastebin.com/rj7PZy8r

brainexe commented 4 years ago

I had to adapt the timeframe, as the old one got purged already. Without the changes andthe new timeframe (1h) still at ~32s.

With your suggested changes amaaaaazing 182ms! Full explain: https://explain.depesz.com/s/P31c

anayrat commented 4 years ago

Nice! I will try to work on this but I am afraid it is not a trivial change.

banlex73 commented 4 years ago

here is the list of partitioned tables with their size [image: image.png]

I monitor 22 servers with 1050 DBs on them, snapshot frequency 120s and I have 3 days data retention

чт, 2 лип. 2020 о 11:21 Adrien nayrat notifications@github.com пише:

To fix performance issue, I had to partition many tables (I am on PG12) чт, 2 лип. 2020 о 07:40 Matthias Dötsch

Which ones? Maybe you should try with master, we commited several performance fix.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/108#issuecomment-653155578, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYAPJR2RDY44EESQ4H3RZTF2FANCNFSM4OOV3AVQ .

anayrat commented 4 years ago

I can't see the picture, but it is awesome! Maybe you can try with recent fix, it should solve several performance issue and graph rendering. I hope to fix this issue next week.

banlex73 commented 4 years ago

Table Size bytes Size Gb powa_all_relations_history 23983415296 22.34 powa_all_relations_history_current 99348619264 92.53 powa_wait_sampling_history_current 9542721536 8.89 powa_wait_sampling_history 7900930048 7.36 powa_statements_history_current 4417323008 4.11 powa_statements_history 21049344000 19.60 powa_kcache_metrics 15304335360 14.25 powa_kcache_metrics_current 11699191808 10.90 powa_qualstats_constvalues_history_current 618078208 0.58 powa_qualstats_quals_history_current 173416448 0.16

пт, 3 лип. 2020 о 12:25 Adrien nayrat notifications@github.com пише:

I can't see the picture, but it is awesome! Maybe you can try with recent fix, it should solve several performance issue and graph rendering. I hope to fix this issue next week.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/108#issuecomment-653656670, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYCWABEEZHO27UKLAATRZYWBJANCNFSM4OOV3AVQ .

banlex73 commented 4 years ago

I know partitioning tables was a big change... and better fix or improve queries but as you can see, my tables are huge.. The only reasonable solution came up in my mind - partition it on PG12!

пт, 3 лип. 2020 о 12:25 Adrien nayrat notifications@github.com пише:

I can't see the picture, but it is awesome! Maybe you can try with recent fix, it should solve several performance issue and graph rendering. I hope to fix this issue next week.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/108#issuecomment-653656670, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYCWABEEZHO27UKLAATRZYWBJANCNFSM4OOV3AVQ .

rjuju commented 4 years ago

@banlex73 that's impressive!

Just one precision, is the size of powa_statements_history before or after the patch for purging unused statements?

The per-relation tables are by far the biggest consumers, which is unsurprising given the number of databases you have. I'm wondering if lowering powa_coalesce could help decreasing the size of powa_all_relations_history_current, while not completely ruining powa_all_relations_history. That would also help if you're usually checking the last hour or so.

Did you try to change that parameter?

banlex73 commented 4 years ago

yes, patch for purging unused statements was applied.. I think, the reason why table is huge is here: pg_stat_statements.max = '20000' pg_stat_statements.track = 'all' I have max very high to catch as many as possible statements... maybe I am wrong, but idea is:

I expect that every time when I am taking a snapshot on my clusters (every 2 min) I am catching all queries, trying to catch expensive queries but not frequent. Let's say, report was run once and took ~60 sec and I want to see its statistics. With low pg_stat_statemens.max I cannot catch this query...Does it make sense to you? regarding powa_coalesce, I didn't understand how it works...so, let me set it to 50, let's say. I will update you tomorrow how it's going here. Thanks again

сб, 4 лип. 2020 о 11:07 Julien Rouhaud notifications@github.com пише:

@banlex73 https://github.com/banlex73 that's impressive!

Just one precision, is the size of powa_statements_history before or after the patch for purging unused statements?

The per-relation tables are by far the biggest consumers, which is unsurprising given the number of databases you have. I'm wondering if lowering powa_coalesce could help decreasing the size of powa_all_relations_history_current, while not completely ruining powa_all_relations_history. That would also help if you're usually checking the last hour or so.

Did you try to change that parameter?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/108#issuecomment-653795395, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYFT3C6CXRROAGOBTU3RZ5VXVANCNFSM4OOV3AVQ .

rjuju commented 4 years ago

I think, the reason why table is huge is here: pg_stat_statements.max = '20000' pg_stat_statements.track = 'all' I have max very high to catch as many as possible statement

Do you know how many distinct (queryid, userid, dbid) you have on the servers, and if with this number pgss is still evicting non frequent queries that are still run between 2 snapshots? According to https://github.com/powa-team/powa/issues/133 you seem to have for some reason a non stable set of queries, but I'm still unsure why.

but idea is:

  • take a snapshot
  • reset statements

This could be tried, but I'm afraid that this can lead to poor UI graphs. Note that you if you want to try you can easily do that by implementing a "fake datasource" (see https://powa.readthedocs.io/en/latest/components/powa-archivist/development.html), with the snapshot function only calling the required reset function. Just make sure to put it with a lower priority so it's called last.

regarding powa_coalesce, I didn't understand how it works...so, let me set it to 50, let's say.

This is the number of snapshots that have to be run before an aggregation is done (all is per remote server). The goal is to pack multiple records in an array to benefit from compression and also minimize tuple overhead. I wrote a blog post about it if you're interested in some more details: https://rjuju.github.io/postgresql/2016/09/16/minimizing-tuple-overhead.html

anayrat commented 3 years ago

@brainexe can you try this patch : https://github.com/powa-team/powa-web/pull/110.patch

banlex73 commented 3 years ago

thanks a lot for your advice! I will try to implement it and let you know

сб, 4 лип. 2020 о 13:36 Julien Rouhaud notifications@github.com пише:

I think, the reason why table is huge is here: pg_stat_statements.max = '20000' pg_stat_statements.track = 'all' I have max very high to catch as many as possible statement

Do you know how many distinct (queryid, userid, dbid) you have on the servers, and if with this number pgss is still evicting non frequent queries that are still run between 2 snapshots? According to powa-team/powa#133 https://github.com/powa-team/powa/issues/133 you seem to have for some reason a non stable set of queries, but I'm still unsure why.

but idea is:

  • take a snapshot
  • reset statements

This could be tried, but I'm afraid that this can lead to poor UI graphs. Note that you if you want to try you can easily do that by implementing a "fake datasource" (see https://powa.readthedocs.io/en/latest/components/powa-archivist/development.html), with the snapshot function only calling the required reset function. Just make sure to put it with a lower priority so it's called last.

regarding powa_coalesce, I didn't understand how it works...so, let me set it to 50, let's say.

This is the number of snapshots that have to be run before an aggregation is done (all is per remote server). The goal is to pack multiple records in an array to benefit from compression and also minimize tuple overhead. I wrote a blog post about it if you're interested in some more details: https://rjuju.github.io/postgresql/2016/09/16/minimizing-tuple-overhead.html

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/108#issuecomment-653809314, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYHWBTJBGSYCYVCN3OTRZ6HDBANCNFSM4OOV3AVQ .

brainexe commented 3 years ago

just a update from my side: with the changes our 30s query is now at ~1s 🚀

rjuju commented 3 years ago

Wow, nice! Good job @anayrat!

Thankg a lot for the feedback @brainexe !

anayrat commented 3 years ago

Yeah!!