mkellycs / postgres_index_integrity_check

This utility provides a quick and dirty smoke test to evaluate integrity of all of the indexes in a postgres database.
MIT License
2 stars 3 forks source link

Handling index only scans #1

Open keithf4 opened 9 years ago

keithf4 commented 9 years ago

So I noticed your check was skipping almost all the indexes in our database. I went and looked and it seems to be because they're doing index only scans.

Line from the log file:

INFO:  (orders_status_created_at_idx) Starting
CONTEXT:  PL/pgSQL function ta_check_collated_index_integrity() line 9 at assignment
INFO:  (orders_status_created_at_idx) Definition: CREATE INDEX orders_status_created_at_idx ON orders USING btree (status, created_at)
CONTEXT:  PL/pgSQL function ta_check_collated_index_integrity() line 9 at assignment
INFO:  (orders_status_created_at_idx) constructed query: SELECT cur, lag FROM (SELECT (status, created_at) as cur, lag ((status, created_at)) OVER (ORDER BY status, created_at)FROM orders WHERE 1=1 AND status IS NOT NULL  AND  created_at IS NOT NULL ) f WHERE cur < lag
CONTEXT:  PL/pgSQL function ta_check_collated_index_integrity() line 9 at assignment
WARNING:  (orders_status_created_at_idx) unable to convince query planner to use index for unknown reason.
CONTEXT:  PL/pgSQL function ta_check_index(regclass) line 73 at IF
PL/pgSQL function ta_check_collated_index_integrity() line 9 at assignment
INFO:  Current Progress { total : 466, skipped 407, invalid 0, bad_records 0 }

Running the constructed query:

# explain SELECT cur, lag FROM (SELECT (status, created_at) as cur, lag ((status, created_at)) OVER (ORDER BY status, created_at)FROM orders WHERE 1=1 AND status IS NOT NULL  AND  created_at IS NOT NULL ) f WHERE cur < lag;
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on f  (cost=0.00..2851170.88 rows=8139869 width=64)
   Filter: (f.cur < f.lag)
   ->  WindowAgg  (cost=0.00..2545925.81 rows=24419606 width=10)
         ->  Index Only Scan using orders_status_created_at_idx on orders  (cost=0.00..2118582.70 rows=24419606 width=10)
               Index Cond: ((status IS NOT NULL) AND (created_at IS NOT NULL))

Looks like ta_plan_scans_index() is only checking specifically for "Index Scan" and missing the keyword "Only" in there. Not sure if your query further down can handle when an index only scan happens, so didn't know if just changing that was enough to fix this.

Thanks for looking into this issue and providing this test!

mkellycs commented 9 years ago

Hey Keith,

Good catch. I'm running 9.1 in my production environment, and index only scans were added in 9.2.

Index only scans only mean that visibility is checked using the visibility map instead of going back to the table. Because the test is only interested in the indexes ordering, everything else should work.

Its just that regex that needs to be fixed. If you've already fixed the regex locally to handle both cases, feel free to file a pull request. If not, I'll try to have a revised version up this evening.

Thanks for catching this!

mkellycs commented 9 years ago

Fixed with: https://github.com/mkellycs/postgres_index_integrity_check/pull/2

Thanks. :) I'd be interested to hear your results.

keithf4 commented 9 years ago

Hmm... still running into a weird issue with some index scans both normal and index only. For some reason the second LIKE statement isn't finding the index name in the EXPLAIN line. I checked the table, and these indexes aren't redundant, and not sure why the LIKE isn't matching.

INFO:  (email.email_abandoned_cart_history_201410_created_at_user_guid_idx) Starting
CONTEXT:  PL/pgSQL function ta_check_collated_index_integrity() line 9 at assignment
INFO:  (email.email_abandoned_cart_history_201410_created_at_user_guid_idx) Definition: CREATE INDEX email_abandoned_cart_history_201410_created_at_user_guid_idx ON email.email_abandoned_cart_history_201410 USING btree (created_at, user_guid)
CONTEXT:  PL/pgSQL function ta_check_collated_index_integrity() line 9 at assignment
INFO:  (email.email_abandoned_cart_history_201410_created_at_user_guid_idx) constructed query: SELECT cur, lag FROM (SELECT (created_at, user_guid) as cur, lag ((created_at, user_guid)) OVER (ORDER BY created_at, user_guid)FROM email.email_abandoned_cart_history_201410 WHERE 1=1 AND created_at IS NOT NULL  AND  user_guid IS NOT NULL ) f WHERE cur < lag
CONTEXT:  PL/pgSQL function ta_check_collated_index_integrity() line 9 at assignment
WARNING:  (email.email_abandoned_cart_history_201410_created_at_user_guid_idx) is probably redundant and should be removed.  See:         ->  Index Only Scan using email_abandoned_cart_history_201410_created_at_user_guid_idx on email_abandoned_cart_history_201410  (cost=0.00..52.59 rows=416 width=156)
CONTEXT:  PL/pgSQL function ta_check_index(regclass) line 73 at IF
PL/pgSQL function ta_check_collated_index_integrity() line 9 at assignment
INFO:  Current Progress { total : 1, skipped 1, invalid 0, bad_records 0 }

# explain SELECT cur, lag FROM (SELECT (created_at, user_guid) as cur, lag ((created_at, user_guid)) OVER (ORDER BY created_at, user_guid)FROM email.email_abandoned_cart_history_201410 WHERE 1=1 AND created_at IS NOT NULL  AND  user_guid IS NOT NULL ) f WHERE cur < lag;
                                                                                  QUERY PLAN                                                                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on f  (cost=0.00..65.07 rows=139 width=64)
   Filter: (f.cur < f.lag)
   ->  WindowAgg  (cost=0.00..59.87 rows=416 width=156)
         ->  Index Only Scan using email_abandoned_cart_history_201410_created_at_user_guid_idx on email_abandoned_cart_history_201410  (cost=0.00..52.59 rows=416 width=156)
               Index Cond: ((created_at IS NOT NULL) AND (user_guid IS NOT NULL))
INFO:  (email.email_messages_201409_lower_idx) Starting
CONTEXT:  PL/pgSQL function ta_check_collated_index_integrity() line 9 at assignment
INFO:  (email.email_messages_201409_lower_idx) Definition: CREATE INDEX email_messages_201409_lower_idx ON email.email_messages_201409 USING btree (lower((email_address)::text))
CONTEXT:  PL/pgSQL function ta_check_collated_index_integrity() line 9 at assignment
INFO:  (email.email_messages_201409_lower_idx) constructed query: SELECT cur, lag FROM (SELECT (lower((email_address)::text)) as cur, lag ((lower((email_address)::text))) OVER (ORDER BY lower((email_address)::text))FROM email.email_messages_201409 WHERE 1=1 AND lower((email_address)::text) IS NOT NULL ) f WHERE cur < lag
CONTEXT:  PL/pgSQL function ta_check_collated_index_integrity() line 9 at assignment
WARNING:  (email.email_messages_201409_lower_idx) is probably redundant and should be removed.  See:         ->  Index Scan using email_messages_201409_lower_idx on email_messages_201409  (cost=0.00..260559.08 rows=1566433 width=21)
CONTEXT:  PL/pgSQL function ta_check_index(regclass) line 73 at IF
PL/pgSQL function ta_check_collated_index_integrity() line 9 at assignment
INFO:  Current Progress { total : 400, skipped 157, invalid 0, bad_records 0 }

# explain SELECT cur, lag FROM (SELECT (lower((email_address)::text)) as cur, lag ((lower((email_address)::text))) OVER (ORDER BY lower((email_address)::text))FROM email.email_messages_201409 WHERE 1=1 AND lower((email_address)::text) IS NOT NULL ) f WHERE cur < lag;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Subquery Scan on f  (cost=0.00..312317.95 rows=523570 width=64)
   Filter: (f.cur < f.lag)
   ->  WindowAgg  (cost=0.00..292684.08 rows=1570709 width=21)
         ->  Index Scan using email_messages_201409_lower_idx on email_messages_201409  (cost=0.00..261269.90 rows=1570709 width=21)
               Index Cond: (lower((email_address)::text) IS NOT NULL)
mkellycs commented 9 years ago

After 10 minutes of staring at this:

The problem is that the passed in name is schema qualified but the explain plan has the name without the qualification.

keithf4 commented 9 years ago

Hmm... I thought that might've been it but then I thought the wildcards on either side of the index name would've caught that?

keithf4 commented 9 years ago

Oh! Had that backwards. I see what you mean.

mkellycs commented 9 years ago

I'm thinking of just lopping off the schema name right here. I'll have a patch up in just a couple minutes.

keithf4 commented 9 years ago

Was going to do the same. I'll just wait for your patch then.

mkellycs commented 9 years ago

Seem reasonable? https://github.com/mkellycs/postgres_index_integrity_check/pull/3

mkellycs commented 9 years ago

I merged that back into master. Can you test and see if this fixes your issue?

keithf4 commented 9 years ago

Only skipped 5 indexes out of 468 now. 4 of them were because it was an index on the parent table of a partitioned set, so the plan comes out differently.
Last one was a valid redundant find. There was both a unique index and unique constraint on the same column. So far looks good. :)