reorg / pg_repack

Reorganize tables in PostgreSQL databases with minimal locks
BSD 3-Clause "New" or "Revised" License
1.85k stars 170 forks source link

over-ruled default search_path breaks index usage #396

Open alexitheodore opened 4 months ago

alexitheodore commented 4 months ago

I have tables which I am trying to repack (and move to a new tablespace, though I doubt that is relevant) and everything works as normal until the end of the process where the DELETE FROM repack.log_... WHERE id IN ( phase begins. Normally, each set of id's to delete should go relatively quickly. However, in this case they take a VERY long time (4-5 hours each query iteration) and my suspicion is that the DELETE is not catching the index as it should be because the data type of id will be custom and from lots of past experience, this happens when the default search_path is modified. The custom data type is not installed on the "public" schema (in fact, there is no schema "public").

I'm not sure exactly how to resolve this, but possibly, the search_path wouldn't be modified from default for that one operation?

andreasscherbaum commented 4 months ago

I'm not following: an index is assigned to a table, and the planner knows which indexes exist for a table. Doesn't matter where the index lives.

Is the index present on the table? Did you run ANALYZE recently? Any chance you can provide a query plan?

alexitheodore commented 4 months ago

The index is not the issue, its the data type that is the issue. So for example, if the column's data type is CITEXT and that extension was created on schema XYZ (where XYZ ≠ public OR pg_catalog) then the index won't be used (even if it exists and works just fine normally when using the default search path, which includes XYZ) if the search path has been set to not include XYZ.

I've tested simulating the delete commands and they work just fine - but the difference is that I am running them with the default search_path and not the one that pg_repack likely uses. I don't know for sure what it uses, but from the source code, it looks like it does something like "pg_temp, pg_catalog, public" assuming everyone puts everything into public.

andreasscherbaum commented 4 months ago

Can you please show the entire table definition, as well as the query plan with and without setting search_path?

za-arthur commented 4 months ago

DELETE FROM repack.log_... WHERE id IN is executed only on a log table which has only a PRIMARY KEY, where id is bigserial, and it won't use an index from the original table. I suspect that you might have many updates on the original table which leads to that pg_repack spends more time to apply changes. Or it is a different issue, it is hard to say without more information.

alexitheodore commented 4 months ago

Ah ok, so then it is a false correlation on my part. I will try to replicate and let you know. It takes about 24 hours to run and produce the issue and by then I am very eager to make it stop, so forgive me for not having all the evidence readily available. But thanks for thinking it through with me.