powa-team / powa-collector

PoWA Collector daemon for remote snapshot
http://powa.readthedocs.io/
PostgreSQL License
7 stars 4 forks source link

Index Suggestions still showing same existing indexes #4

Closed wagner-penguinformula closed 4 years ago

wagner-penguinformula commented 5 years ago

Hi, after that I created some indexes as powa suggested me, even after some weeks, some indexes still being shown on powa-web. How can I solve it?

Thank you!

wagner-penguinformula commented 5 years ago

btw: I'm running on a Ubuntu 16.04 server with Postgres 9.6.14

rjuju commented 5 years ago

Hello,

Are you talking about the "global suggestion" (in the per-database page), or the "per-query" index suggestion?

Did you do additional testing to validate the suggested indexes usefulness? Are the indexes that are still suggested the same as the one you created or different one?

wagner-penguinformula commented 5 years ago

I believe that this is the per-query: [image: image.png] But this is showed after "Optimize this database !" button

On Tue, Jul 30, 2019 at 4:26 PM Julien Rouhaud notifications@github.com wrote:

Hello,

Are you talking about the "global suggestion" (in the per-database page), or the "per-query" index suggestion?

Did you do additional testing to validate the suggested indexes usefulness? Are the indexes that are still suggested the same as the one you created or different one?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-collector/issues/4?email_source=notifications&email_token=ALWECEB7GUDDO6ZW4EGIAULQCBMT5A5CNFSM4IH5OW6KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3ELF4Q#issuecomment-516469490, or mute the thread https://github.com/notifications/unsubscribe-auth/ALWECEHFMZAVA6PMYRPHWNLQCBMT5ANCNFSM4IH5OW6A .

rjuju commented 5 years ago

Unfortunately the picture didn't survive the mail processing.

wagner-penguinformula commented 5 years ago

powa attached

rjuju commented 5 years ago

Thanks! I confirm this is the global optimization one. The suggestion looks sane given the WHERE clauses.

So if you created those two indexes weeks ago, and you still get those results if you use the suggestion on current data (like last hour or similar)?

Could you manually check a few queries to see if they're indeed not using the indexes?

rjuju commented 5 years ago

Could you also confirm the version of powa-web, powa and pg_qualstats you're using?

wagner-penguinformula commented 5 years ago

Powa-Web - 3.2.0 powa - apt get postgresql-9.6-powa qualstats: postgresql-9.6-pg-qualstats

wagner-penguinformula commented 5 years ago

Running the queries manually with Explain Analyze, the queries are running against the indexes created. But it's still shown on powa-web

rjuju commented 5 years ago

That's very strange. Is this the result when using the default timerange (last hour)? If yes, can you run this query on the powa database, modifying MY_DATABASE_NAME according to your target database name?

SELECT array_agg(queryid) AS queryids, qualid, CAST(quals AS JSONB) AS quals, occurences, execution_count, array_agg(query) AS queries, avg_filter, filter_ratio
    FROM (SELECT qualid, powa_statements.queryid AS queryid, query, powa_statements.dbid AS dbid, to_json(quals) AS quals, sum(execution_count) AS execution_count, sum(occurences) AS occurences, sum(nbfiltered) / sum(occurences) AS avg_filter, CASE WHEN (sum(execution_count) = 0) THEN 0 ELSE (sum(nbfiltered) / CAST(sum(execution_count) AS NUMERIC)) * 100 END AS filter_ratio
    FROM
        (
        SELECT queryid, qualid, (unnested.records).*
        FROM (
            SELECT pqnh.qualid, pqnh.queryid, pqnh.dbid, pqnh.userid, pqnh.coalesce_range, unnest(records) as records
            FROM powa_qualstats_quals_history pqnh
            WHERE coalesce_range  && tstzrange(now() - interval '1 hour', now(), '[]')
        ) AS unnested
        WHERE tstzrange(now() - interval '1 hour', now(), '[]') @> (records).ts
        UNION ALL
        SELECT queryid, qualid, pqnc.ts, pqnc.occurences, pqnc.execution_count, pqnc.nbfiltered
        FROM powa_qualstats_quals_history_current pqnc
        WHERE tstzrange(now() - interval '1 hour', now(), '[]') @> pqnc.ts
        ) h
        JOIN powa_qualstats_quals pqnh USING (queryid, qualid)
         JOIN powa_statements ON powa_statements.queryid = pqnh.queryid GROUP BY qualid, powa_statements.queryid, powa_statements.dbid, powa_statements.query, quals) AS anon_1 JOIN powa_databases ON powa_databases.oid = dbid
    WHERE powa_databases.datname = 'MY_DATABASE_NAME'
    AND avg_filter > 1000 AND filter_ratio > 0.3
    GROUP BY qualid, execution_count, occurences, CAST(quals AS JSONB), avg_filter, filter_ratio
    ORDER BY occurences DESC
     LIMIT 200
;
wagner-penguinformula commented 5 years ago

If I run the powa-web with one hour, those indexes does not appear as suggestions.

Running this query manually, I got the queries for the "Details for all queries" block on the powa-web interface.

rjuju commented 5 years ago

If I run the powa-web with one hour, those indexes does not appear as suggestions.

What interval do you use to have the extraneous indexes suggestion?

Running this query manually, I got the queries for the "Details for all queries" block on the powa-web interface.

I'm confused. This is the query used by the wizard to retrieve the list of quals that are worth optimizing, so if this query returns something for the last hour, you should also see them on the wizard / global optimization block.

wagner-penguinformula commented 5 years ago

For the extraneous indexes suggestions, I tried 3 days of interval.

wagner-penguinformula commented 5 years ago

A doubt: The column order, matter for Postgres, right? For example: I must start the indexes always with columns that will filter more data.

I ask this, because powa is suggesting me strange column orders. example:

CREATE INDEX ON public.tbl_events USING btree(tenant_tenantid,sharable_sharableid,active); where tenantId is a column that will not filter so many data.

Another question: I have an index with (enddate, startdate), but powa is suggesting me this index, and I think this does not make senses: CREATE INDEX ON public.tbl_events USING btree(startdate,enddate)

Thank you!

rjuju commented 5 years ago

CREATE INDEX ON public.tbl_events USING btree(tenant_tenantid,sharable_sharableid,active); where tenantId is a column that will not filter so many data.

In this case, tenant_tenanit is put first so that more queries can benefit from this index (there is apparently a query with only WHERE tenant_tenantid = ?).

I have an index with (enddate, startdate), but powa is suggesting me this index, and I think this does not make senses: CREATE INDEX ON public.tbl_events USING btree(startdate,enddate)

Given the WHERE clauses you have, powa won't chose a specific column order. It also ignores existing indexes. This index is suggested because there are queries during your selected interval that have quals on both columns that do no use an index, so powa suggest you one (same for the other index).

That's btw your root issue here. It seems that even though you have those indexes, sometimes queries are run but don't use existing indexes.

wagner-penguinformula commented 5 years ago

In this case, tenant_tenanit is put first so that more queries can benefit from this index (there is apparently a query with only WHERE tenant_tenantid = ?).

yes, there is a query with only tenant_tenantid. but also, there is an index with only tenant_tenantid

Given the WHERE clauses you have, powa won't chose a specific column order. It also ignores existing indexes. This index is suggested because there are queries during your selected interval that have quals on both columns that do no use an index, so powa suggest you one (same for the other index).

That's btw your root issue here. It seems that even though you have those indexes, sometimes queries are run but don't use existing indexes.

So, this was my question. I have queries that are already indexed, but still showing on powa-web. If this is not related to powa, how can I force Postgres to use the indexes? (manually, the explain analyze uses the indexes for sure)

Thanks!

rjuju commented 5 years ago

yes, there is a query with only tenant_tenantid. but also, there is an index with only tenant_tenantid

That's likely the same issue. The existing index is (at least some times) not used, so pg_qualstats detect a sequential scan on it and powa tries to optimize it.

how can I force Postgres to use the indexes? (manually, the explain analyze uses the indexes for sure)

Well, we'd need to understand why postgres isn't using the index in the first place, especially if manual testing shows that the index is used.

Do you have some tenantid values which represent 80% of the table for instance? That would explain that the index is sometimes unused. Do you otherwise have queries that are sometimes way slower than usual? (powa interface should show that). If you can't find any evidence, I'll write a query trying to detect when such quals are detected, that may help to narrow down the root issue.

wagner-penguinformula commented 5 years ago

Hi Julien! Thanks for your help. Yes, I have in this database, a schema with a tenantId with more than 80% of the table. So it explains why the index sometimes is not being used.

But about endDate and startDate, that I have already indexed, and still appearing on powa-web, I don't understand why.

The column order matter for indexes on postgres, right? But, in the query if I have endDate first or after startDate, does it matter for which indexes the database will use?

rjuju commented 5 years ago

Mmm, actually the query that chooses the qual to be optimized pick the one the one that filter at 1k rows and at least 30% of the table. Maybe we could make it more aggressive (or ideally configurable).

The column order matter for indexes on postgres, right?

It depends on the kind of index, but for btree it definitely matters.

But, in the query if I have endDate first or after startDate, does it matter for which indexes the database will use?

No, as long as you have both column in an AND-ed expression the index can be used.

But about endDate and startDate, that I have already indexed, and still appearing on powa-web, I don't understand why.

Powa could have suggested exactly the same index instead of one with column in reversed order. Since there are (apparently) no queries having a requirement for the first column, the one output is somewhat random. That's here the same problem, you probably have some values that don't trigger an index scan, but which are probably selective enough to be picked up by powa.

You could try to change your local version to only pck more selective predicates. For that, you'd need to edit the powa/wizard.py' file, and inWizardMetricGroup`, modify this line:

            .where(column("filter_ratio") > 0.3)

with for instance

            .where(column("filter_ratio") > 0.5)

and restart powa-web.