Open fazalmajid opened 3 years ago
👍
I'm doing something similar here: https://github.com/Ten0/pg_repack/commit/27ddafbd97415acb3310ac08a87458b82e58d10e
I have used that to reorder through joins, but not to filter out lines.
I'm wondering about whether I can use this to filter out lines (for a similar reason as yours). I assume this would cause issues if there were foreign key constraints (they wouldn't be checked), but I would expect that if there aren't it shouldn't. Is that correct, or would swapping the tables after not inserting as many lines as there originally were cause issues?
Like all DBA-level operations, it should be used carefully, of course, and if you have foreign keys, you would need to repack/purge the tables in topological sort order. That said, we've used this in production with fairly large tables (100+GB) with no issues. You will typically need to quiesce traffic to the DB at the very beginning and very end so the DDL locks used to swap the tables can be taken.
At my company we use pg_repack primarily as a tool to delete obsolete historical data or for compliance reasons (e.g. GDPR). Running repack kills two birds with one stone: remove bloat but also remove old data where a long-running DELETE is impractical due to the maintenance window it would entail due to full table locks.
Our current implementation requires no C level changes to the extension, just patching the PL/PGSQL code, using the migration below. Essentially we add a table
repack.filters
that has a WHERE clause that specified what rows to keep, and optionall arepack.joins
table with JOIN clauses if we need information from other tables to make the decision. If there is no matching row in those tables, the existing pg_repack behavior (copy everything) holds.I think this is a fairly valuable feature and use case for pg_repack, and if there is interest, I'd be happy to package it as a PR for inclusion upstream.
So for instance if you have the table definitions:
And you want to repack out employees with expired contracts, you would: