powa-team / powa-web

PoWA user interface
http://powa.readthedocs.io/
73 stars 30 forks source link

"Optimize this db" not working #44

Closed asr1901 closed 5 years ago

asr1901 commented 6 years ago

I pressed "Optimize this database" for a couple different databases in the Powa Web UI. Unfortunately, no data is shown and this is the message from the error log:


    HTTPRequest(protocol='http', host='********:8888', method='GET', uri='/metrics/database/dev6/wizard/?from=2017-10-27+00%3A00%3A00-0400&to=2017-10-27+12%3A00%3A26-0400', version='HTTP/1.1', remote_ip='192.168.110.61', body='', headers={'Accept-Language': 'en-US,en;q=0.5', 'Accept-Encoding': 'gzip, deflate', 'Host': '********:8888', 'Accept': '*/*', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:56.0) Gecko/20100101 Firefox/56.0', 'Dnt': '1', 'Connection': 'keep-alive', 'X-Requested-With': 'XMLHttpRequest', 'Referer': 'http://********:8888/database/dev6/overview/?to=2017-10-27T12%3A00%3A26-04%3A00&from=2017-10-27T00%3A00%3A00-04%3A00', 'Cookie': '_flashes="Ti4=|1509120139|17fa876c77c0dcac43e1daf8c1b30186e74eebe1"; username="cG93YQ==|1509064881|df051eff189db9683486f588a362dd1739b1cb69"; password=bmV4bW9z|1509064881|653b9957d9d47b92f23074d1769b030376653359; server="bWFpbg==|1509064881|2f3437daf7b167384e923938669cc8e7a761f6e5"'})
    Traceback (most recent call last):
      File "/usr/lib/python2.6/site-packages/tornado/web.py", line 1021, in _execute
        getattr(self, self.request.method.lower())(*args, **kwargs)
      File "/usr/lib/python2.6/site-packages/powa/dashboards.py", line 86, in get
        data = {"data": [self.process(val, **url_params) for val in values]}
      File "/usr/lib/python2.6/site-packages/powa/dashboards.py", line 104, in process
        return dict(val)
      File "/usr/lib/python2.6/site-packages/powa/sql/compat.py", line 23, in process
        return json_deserializer(value.decode(encoding))
    AttributeError: 'list' object has no attribute 'decode'
[E 171027 12:02:25 web:1447] 500 GET /metrics/database/dev6/wizard/?from=2017-10-27+00%3A00%3A00-0400&to=2017-10-27+12%3A00%3A26-0400 (192.168.110.61) 1973.05ms
asr1901 commented 6 years ago

Also, one query on this same db doesn't show any quals/predicates data. This is now behavior that I see on multiple queries. This is the error in the log file when I pull up the query:

Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/tornado/web.py", line 1021, in _execute getattr(self, self.request.method.lower())(*args, **kwargs) File "/root/powa-web-master/powa/query.py", line 207, in get self.get_argument("to")) File "/root/powa-web-master/powa/sql/__init__.py", line 402, in get_any_sample_query _from, _to, 'most executed') File "/root/powa-web-master/powa/sql/__init__.py", line 374, in get_unjumbled_query values[kind].get('constants', [])) TypeError: 'NoneType' object is unsubscriptable

rjuju commented 6 years ago

First of all, can you precise the version of following dependancies:

Then, does this query (on powa database) returns rows? (you may have to change the timestamps to match the interval you requested)

SELECT powa_databases.datname, array_agg(queryid) AS queryids, qualid, CAST(quals AS JSONB) AS quals, occurences, execution_count, array_agg(query) AS queries, avg_filter, filter_ratio 
    FROM (SELECT qualid, powa_statements.queryid AS queryid, query, powa_statements.dbid AS dbid, to_json(quals) AS quals, sum(execution_count) AS execution_count, sum(occurences) AS occurences, sum(nbfiltered) / sum(occurences) AS avg_filter, CASE WHEN (sum(execution_count) = 0) THEN 0 ELSE (sum(nbfiltered) / CAST(sum(execution_count) AS NUMERIC)) * 100 END AS filter_ratio 
    FROM 
        (
        SELECT queryid, qualid, (unnested.records).*
        FROM (
            SELECT pqnh.qualid, pqnh.queryid, pqnh.dbid, pqnh.userid, pqnh.coalesce_range, unnest(records) as records
            FROM powa_qualstats_quals_history pqnh
            WHERE coalesce_range  && tstzrange('2017-10-27 17:16:02+0200', '2017-10-27 18:16:02+0200', '[]')
        ) AS unnested
        WHERE tstzrange('2017-10-27 17:16:02+0200', '2017-10-27 18:16:02+0200', '[]') @> (records).ts
        UNION ALL
        SELECT queryid, qualid, pqnc.ts, pqnc.occurences, pqnc.execution_count, pqnc.nbfiltered
        FROM powa_qualstats_quals_history_current pqnc
        WHERE tstzrange('2017-10-27 17:16:02+0200', '2017-10-27 18:16:02+0200', '[]') @> pqnc.ts
        ) h
        JOIN powa_qualstats_quals pqnh USING (queryid, qualid)
         JOIN powa_statements ON powa_statements.queryid = pqnh.queryid GROUP BY qualid, powa_statements.queryid, powa_statements.dbid, powa_statements.query, quals) AS anon_1 JOIN powa_databases ON powa_databases.oid = dbid 
    WHERE avg_filter > 1000 AND filter_ratio > 0.3 GROUP BY qualid, execution_count, occurences, CAST(quals AS JSONB), avg_filter, filter_ratio ORDER BY occurences DESC 
     LIMIT 200

Do you have queries having constant in the select part? (for instance select 123, 'hello' FROM ...)

asr1901 commented 6 years ago

python-sqlalchemy-0.5.5-3.el6_2.noarch python-sqlalchemy0.8-0.8.2-4.el6.x86_64 python-psycopg2-2.7.3-1.rhel6.x86_64 python-tornado-2.3-0.noarch

The SQL query you gave me errors with : ERROR: column "powa_databases.datname" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT powa_databases.datname, array_agg(queryid) AS queryid...

Yes, there are constants in the SELECT, i.e. "SELECT 'abc'"

rjuju commented 6 years ago

ah sorry I missed the last copy/paste:

SELECT powa_databases.datname, array_agg(queryid) AS queryids, qualid, CAST(quals AS JSONB) AS quals, occurences, execution_count, array_agg(query) AS queries, avg_filter, filter_ratio 
    FROM (SELECT qualid, powa_statements.queryid AS queryid, query, powa_statements.dbid AS dbid, to_json(quals) AS quals, sum(execution_count) AS execution_count, sum(occurences) AS occurences, sum(nbfiltered) / sum(occurences) AS avg_filter, CASE WHEN (sum(execution_count) = 0) THEN 0 ELSE (sum(nbfiltered) / CAST(sum(execution_count) AS NUMERIC)) * 100 END AS filter_ratio 
    FROM 
        (
        SELECT queryid, qualid, (unnested.records).*
        FROM (
            SELECT pqnh.qualid, pqnh.queryid, pqnh.dbid, pqnh.userid, pqnh.coalesce_range, unnest(records) as records
            FROM powa_qualstats_quals_history pqnh
            WHERE coalesce_range  && tstzrange('2017-10-27 17:16:02+0200', '2017-10-27 18:16:02+0200', '[]')
        ) AS unnested
        WHERE tstzrange('2017-10-27 17:16:02+0200', '2017-10-27 18:16:02+0200', '[]') @> (records).ts
        UNION ALL
        SELECT queryid, qualid, pqnc.ts, pqnc.occurences, pqnc.execution_count, pqnc.nbfiltered
        FROM powa_qualstats_quals_history_current pqnc
        WHERE tstzrange('2017-10-27 17:16:02+0200', '2017-10-27 18:16:02+0200', '[]') @> pqnc.ts
        ) h
        JOIN powa_qualstats_quals pqnh USING (queryid, qualid)
         JOIN powa_statements ON powa_statements.queryid = pqnh.queryid GROUP BY qualid, powa_statements.queryid, powa_statements.dbid, powa_statements.query, quals) AS anon_1 JOIN powa_databases ON powa_databases.oid = dbid 
    WHERE avg_filter > 1000 AND filter_ratio > 0.3 GROUP BY powa_databases.datname, qualid, execution_count, occurences, CAST(quals AS JSONB), avg_filter, filter_ratio ORDER BY occurences DESC 
     LIMIT 200

powa is compatible with sqlalchemy O.8+, so I wonder if you're yusing 0.5.5 or 0.8.0

Can you check which one is used?

asr1901 commented 6 years ago

However, not all the queries that are failing in this way have constants. One query is simply: select distinct aaa as col_00 from b.c d order by f

asr1901 commented 6 years ago

Yes, that query returns 72 rows.

asr1901 commented 6 years ago

It looks like it's using 0.5.5 strangely enough. When I tried to uninstall the .5.5 version of SQL Alchemy it says that it's a dependency for "powa_94-web". This machine is running postgres9.4

asr1901 commented 6 years ago

OK, I uninstalled the .5.5 version sqlalchemy and am using the HEAD version of powa-web from github. Unfortunately the same error persists for the same queries:

Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/tornado/web.py", line 1021, in _execute getattr(self, self.request.method.lower())(*args, **kwargs) File "/root/powa-web-master/powa/query.py", line 207, in get self.get_argument("to")) File "/root/powa-web-master/powa/sql/__init__.py", line 402, in get_any_sample_query _from, _to, 'most executed') File "/root/powa-web-master/powa/sql/__init__.py", line 374, in get_unjumbled_query values[kind].get('constants', [])) TypeError: 'NoneType' object is unsubscriptable

rjuju commented 6 years ago

Hello,

Sorry for coming back so late, I've unfortunately been quite busy :(

I tried multiple time to reproduce this problem but I couldn't raise this error.

Are there a lot of queries to process, and can you share them?

asr1901 commented 6 years ago

Yes, there are hundreds of queries across ~20 databases. Unfortunately, we had to stop using POWA because the machines it was running on were upgraded to Postgres 10. I'm not sure if we should have dropped the 'powa' db when upgrading to pg10 from 9.4, but postgres started reporting "out of shared memory" errors and even crashed/restarted during times of heavy use. Anecdotally, one of the processes that was always there in every crash log was "powa_take_snapshot()". Additionally, I frequently saw a lot of queries waiting for qualstats/powa during the work day. I'm not sure if more optimization needs to be done for PoWA on pg10, but we also identified that our /proc/sys/kernel/shmmax was set low. We have updated this parameter and may try PoWA again again soon.

rjuju commented 6 years ago

Ah, sorry to hear that :(

As far as I know, there wasn't issue with shared mem with any of the powa related extension, except https://github.com/dalibo/pg_stat_kcache/commit/9c926d3ef02b4dab709caf072da1a8ee75399387 which was fixed more than a year ago.

There have been some serious bug involving crash with pg_qualstats (see last commits ), and I realize now I never released 1.0.3 :( Shame on me, I could swear I did it more than a week ago. I'll probably do it one evening this week or next week end. Anyway you shouldn't use pg_qualstats before 1.0.3 is out (and please come yell at me if I forget it again).

For performance issue, with default setting pg_qualstats shouldn't be much a bottleneck since it's sampling 1/max_connections queries. If you have a huge number of tps, you should consider lowering pg_qualstats.sampling_rate even more.

rjuju commented 6 years ago

FTR I just released pg_qualstats 1.0.3. I also tried to reproduce the "out of shared memory" issue, but without any luck for now.

rjuju commented 5 years ago

This has been some time since this issue was open, and there was some work done with powa4 to improve the optimization part, so hopefully this has been fixed on the way. I'm closing this issue, feel free to reopen it or open a new one if needed!