tvondra / pg_check

a tool to verify integrity of PostgreSQL data files
BSD 3-Clause "New" or "Revised" License
48 stars 17 forks source link

pg_check detects index issues on a freshly re-indexed table? #3

Open fake-name opened 7 years ago

fake-name commented 7 years ago

webarchive=# select pg_check_table('raw_web_pages', true, true);
NOTICE:  checking index: ix_raw_web_pages_distance
WARNING:  there are 1 differences between the table and the index
NOTICE:  checking index: ix_raw_web_pages_netloc
WARNING:  there are 1 differences between the table and the index
NOTICE:  checking index: ix_raw_web_pages_priority
WARNING:  there are 2 differences between the table and the index
NOTICE:  checking index: ix_raw_web_pages_state
NOTICE:  checking index: ix_raw_web_pages_url
WARNING:  there are 828 differences between the table and the index
NOTICE:  checking index: raw_web_pages_pkey
WARNING:  there are 442 differences between the table and the index
NOTICE:  checking index: ix_raw_web_pages_id
WARNING:  there are 309 differences between the table and the index
 pg_check_table
----------------
           1583
(1 row)

webarchive=# REINDEX raw_web_pages
webarchive=# REINDEX TABLE raw_web_pages;

REINDEX
webarchive=#
webarchive=# select pg_check_table('raw_web_pages', true, true);
NOTICE:  checking index: ix_raw_web_pages_distance
WARNING:  there are 5706 differences between the table and the index
NOTICE:  checking index: ix_raw_web_pages_netloc
WARNING:  there are 5706 differences between the table and the index
NOTICE:  checking index: ix_raw_web_pages_priority
WARNING:  there are 5722 differences between the table and the index
NOTICE:  checking index: ix_raw_web_pages_state
WARNING:  there are 5706 differences between the table and the index
NOTICE:  checking index: ix_raw_web_pages_url
WARNING:  there are 5711 differences between the table and the index
NOTICE:  checking index: raw_web_pages_pkey
WARNING:  there are 8195 differences between the table and the index
NOTICE:  checking index: ix_raw_web_pages_id
WARNING:  there are 8195 differences between the table and the index
 pg_check_table
----------------
          44941
(1 row)

webarchive=#

Relevant table:

webarchive=# \d raw_web_pages
                                        Table "public.raw_web_pages"
     Column      |            Type             |                         Modifiers
-----------------+-----------------------------+------------------------------------------------------------
 id              | bigint                      | not null default nextval('raw_web_pages_id_seq'::regclass)
 state           | dlstate_enum                | not null
 errno           | integer                     |
 url             | text                        | not null
 starturl        | text                        | not null
 netloc          | text                        | not null
 priority        | integer                     | not null
 distance        | integer                     | not null
 mimetype        | text                        |
 filename        | text                        |
 fspath          | text                        |
 fetchtime       | timestamp without time zone |
 addtime         | timestamp without time zone |
 ignoreuntiltime | timestamp without time zone | not null
Indexes:
    "raw_web_pages_pkey" PRIMARY KEY, btree (id)
    "ix_raw_web_pages_url" UNIQUE, btree (url)
    "ix_raw_web_pages_distance" btree (distance)
    "ix_raw_web_pages_id" btree (id)
    "ix_raw_web_pages_netloc" btree (netloc)
    "ix_raw_web_pages_priority" btree (priority)
    "ix_raw_web_pages_state" btree (state)

Not sure what's going on. Why does reindexing make it worse?

tvondra commented 7 years ago

I haven't really touched the code since 2012, and while the on-disk format did not change since that time, I'm sure I got some of the details wrong back then. So don't worry, I'm pretty sure the database is fine but pg_check is somewhat broken. Sorry for the confusion.

I plan to look at the code and fix it, hopefully in not too distant future.

Can you share some simple test case demonstrating the issue? Which PostgreSQL version is that?

fake-name commented 7 years ago

9.6.1, iirc.

I kind of assumed it was a pg_check issue, as the table I'm checking was just relocated from one table-space to another.

I don't have a compact test case, unfortunately. I tested on a few small tables, and it passed, but the table in the post is a few tens of gigabytes. The table is created from this project, fwiw (https://github.com/fake-name/ReadableWebProxy/blob/master/common/raw_archive_db.py#L36)

petergeoghegan commented 6 years ago

I tried building pg_check with assertions, just to try it.

I bet the immediate issue is with LP_REDIRECT and/or LP_DEAD item pointers that are not handled within bitmap_add_heap_items(). I saw an assertion failure there, which wasn't very hard to fix.

(Obviously what @tvondra says is still true; I just noticed this open issue in passing and thought I'd tell you what I know.)