repack broken in the presence of an ALL TABLES publication set

benchub commented 4 years ago

We have a new use case where it makes sense to make a publication set in PG12 for ALL TABLES. Sadly, this means all logged tables that will get updates need to have a replica identity, which is either the primary key, a non-null unique index that could be the primary key, or the entire row. pg_repack makes its new tables without a replica identity, and so postgres says "no" when pg_repack tries to do things.

Here is a quick example of the problem pg_repack will have:

...and here is an example of pg_repack actually having a problem:

INFO: repacking table "cluster24_shard_2873.enrollments"
ERROR: query failed: ERROR: cannot update table "table_21721" because it does not have a replica identity and publishes updates
HINT: To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.
CONTEXT: SQL statement "UPDATE repack.table_21721 SET (id, user_id, course_id, type, uuid, workflow_state, created_at, updated_at, associated_user_id, sis_batch_id, start_at, end_at, course_section_id, root_account_id, completed_at, self_enrolled, grade_publishing_status, last_publish_attempt_at, stuck_sis_fields, grade_publishing_message, limit_privileges_to_course_section, last_activity_at, total_activity_time, role_id, graded_at, sis_pseudonym_id, last_attended_at) = ($2.id, $2.user_id, $2.course_id, $2.type, $2.uuid, $2.workflow_state, $2.created_at, $2.updated_at, $2.associated_user_id, $2.sis_batch_id, $2.start_at, $2.end_at, $2.course_section_id, $2.root_account_id, $2.completed_at, $2.self_enrolled, $2.grade_publishing_status, $2.last_publish_attempt_at, $2.stuck_sis_fields, $2.grade_publishing_message, $2.limit_privileges_to_course_section, $2.last_activity_at, $2.total_activity_time, $2.role_id, $2.graded_at, $2.sis_pseudonym_id, $2.last_attended_at) WHERE (id) = ($1.id)"
DETAIL: query was: SELECT repack.repack_apply($1, $2, $3, $4, $5, $6)

I don't know enough about repack's internals to know if it makes sense to create the PK before applying changes from the scratch table, but that would maybe be an easy solution?

benchub commented 4 years ago

.... and just to clarify, I see from my postgres logs that the new table has the index that will become the PK already created before this error shows up, but only as a unique index, not a primary key.

DiegoDAF commented 1 year ago

I have a similar issue, a table with a table of 7GBs, I tried both methods with the uq index and Replica Identity: FULL, but it didn't work.

on AWS RDS, pg13 and pr_repack 1.4.6

create unique index concurrently idx_uq_xxxx on xxxx (id); 
alter table xxxx REPLICA IDENTITY USING INDEX idx_uq_xxxx;

ALTER TABLE xxxx REPLICA IDENTITY FULL;

\d+ xxxx  
                                                           Table "public.xxxx"                                                
           Column           |       Type        | Collation | Nullable |         Default         | Storage  | Stats target | Description  
----------------------------+-------------------+-----------+----------+-------------------------+----------+--------------+------------- 
 id                         | uuid              |           | not null | uuid_generate_v4()      | plain    |              |  
 title                      | character varying |           | not null |                         | extended |              |  
 job_template_id            | uuid              |           | not null |                         | plain    |              |  
 job_template_version       | integer           |           | not null |                         | plain    |              |  
 job_tree                   | bytea             |           | not null |                         | extended |              |  
 team_id                    | uuid              |           | not null |                         | plain    |              |  
 completed_at               | bigint            |           | not null |                         | plain    |              |  
 reason_completed           | character varying |           |          |                         | extended |              |  
 completed_steps            | integer           |           | not null | 0                       | plain    |              |  
 total_steps                | integer           |           | not null | 0                       | plain    |              |  
 created_at                 | bigint            |           | not null |                         | plain    |              |  
 updated_at                 | bigint            |           | not null |                         | plain    |              |  
 is_preview                 | boolean           |           | not null | false                   | plain    |              |  
 started_at                 | bigint            |           | not null | 0                       | plain    |              |  
 planned_start_at           | bigint            |           | not null | 0                       | plain    |              |  
 creator_id                 | uuid              |           |          |                         | plain    |              |  
 source_templates           | jsonb             |           |          |                         | extended |              |  
 metadata                   | jsonb             |           |          | '{}'::jsonb             | extended |              |  
 mesh_enabled               | boolean           |           | not null | false                   | plain    |              |  
 recurring_rule             | jsonb             |           |          |                         | extended |              |  
 recurred_info              | jsonb             |           |          |                         | extended |              |  
 fix_stats                  | boolean           |           |          |                         | plain    |              |  
 start_auto                 | boolean           |           |          |                         | plain    |              |  
 job_type                   | integer           |           | not null |                         | plain    |              |  
 using_events               | boolean           |           |          |                         | plain    |              |  
 src                        | jsonb             |           |          |                         | extended |              |  
 priority                   | integer           |           |          |                         | plain    |              |  
 lookup_id                  | integer           |           |          |                         | plain    |              |  
 deviations                 | jsonb             |           |          |                         | extended |              |  
 deviations_updated_at      | bigint            |           | not null | 0                       | plain    |              |  
 archived_at                | bigint            |           |          | 0                       | plain    |              |  
 started_by                 | uuid              |           |          |                         | plain    |              |  
 completed_by               | uuid              |           |          |                         | plain    |              |  
 location                   | character varying |           |          | ''::character varying   | extended |              |  
 attributes                 | jsonb             |           |          |                         | extended |              | 
 planned_start_job_role_ids | jsonb             |           |          |                         | extended |              | 
 referenced_fields          | jsonb             |           |          |                         | extended |              | 
 enable_step_tracking       | boolean           |           |          |                         | plain    |              | 
 completion_code            | character varying |           |          | '{}'::character varying | extended |              | 
 archived_by                | uuid              |           |          |                         | plain    |              | 
 canceled_at                | bigint            |           |          | 0                       | plain    |              | 
 canceled_by                | uuid              |           |          |                         | plain    |              | 
 expires_at                 | bigint            |           |          | 0                       | plain    |              | 
 team_active                | boolean           |           |          | true                    | plain    |              | 
 expires_after_seconds      | bigint            |           |          |                         | plain    |              | 
Indexes:
    "xxxx_pkey" PRIMARY KEY, btree (id)
    "gin_index_on_job_title" gin (to_tsvector('english'::regconfig, title::text))
    "gin_index_simple_dictionary_on_job_title" gin (to_tsvector('simple'::regconfig, title::text))
    "idx_uq_xxxx" UNIQUE, btree (id) REPLICA IDENTITY
    "index_xxxx_on_archived_by" btree (archived_by)
    "index_xxxx_on_canceled_by" btree (canceled_by)
    "index_xxxx_on_completed_at" btree (completed_at)
    "index_xxxx_on_completed_by" btree (completed_by)
    "index_xxxx_on_created_at" btree (created_at)
    "index_xxxx_on_creator_id" btree (creator_id)
    "index_xxxx_on_id" btree (id)
    "index_xxxx_on_job_template_id" btree (job_template_id)
    "index_xxxx_on_job_type_and_src_is_not_null" btree (job_type) WHERE src IS NOT NULL
    "index_xxxx_on_lookup_id" UNIQUE, btree (team_id, job_type, lookup_id)
    "index_xxxx_on_metadata" gin (metadata jsonb_path_ops)
    "index_xxxx_on_planned_start_job_role_ids" gin (planned_start_job_role_ids)
    "index_xxxx_on_priority" btree (priority)
    "index_xxxx_on_source_templates" gin (source_templates jsonb_path_ops)
    "index_xxxx_on_src_id_and_job_type" btree ((src ->> 'id'::text), (src ->> 'job_type'::text))
    "index_xxxx_on_started_at" btree (started_at)
    "index_xxxx_on_started_by" btree (started_by)
    "index_xxxx_on_team_id" btree (team_id)
    "index_xxxx_on_updated_at" btree (updated_at)
    "xxxx_src_index" btree ((src -> 'jobPtr'::text)) 
Foreign-key constraints:
    "xxxx_archived_by_fkey" FOREIGN KEY (archived_by) REFERENCES users(id) NOT VALID
    "xxxx_canceled_by_fkey" FOREIGN KEY (canceled_by) REFERENCES users(id) NOT VALID
    "xxxx_completed_by_fkey" FOREIGN KEY (completed_by) REFERENCES users(id) NOT VALID
    "xxxx_creator_id_fkey" FOREIGN KEY (creator_id) REFERENCES users(id) NOT VALID
    "xxxx_job_template_id_fkey" FOREIGN KEY (job_template_id, job_template_version) REFERENCES job_templates(id, version) NOT VALID
    "xxxx_started_by_fkey" FOREIGN KEY (started_by) REFERENCES users(id) NOT VALID
    "xxxx_team_id_fkey" FOREIGN KEY (team_id) REFERENCES teams(id) ON DELETE CASCADE
Referenced by:
    TABLE "debug_reports" CONSTRAINT "debug_reports_job_id_fkey" FOREIGN KEY (job_id) REFERENCES xxxx(id) ON DELETE CASCADE
    TABLE "execution_data" CONSTRAINT "execution_data_job_id_fkey" FOREIGN KEY (job_id) REFERENCES xxxx(id) ON DELETE CASCADE
    TABLE "job_deviations" CONSTRAINT "job_deviations_job_id_fkey" FOREIGN KEY (job_id) REFERENCES xxxx(id) ON DELETE CASCADE
    TABLE "job_users" CONSTRAINT "job_users_job_id_fkey" FOREIGN KEY (job_id) REFERENCES xxxx(id) ON DELETE CASCADE
    TABLE "messages" CONSTRAINT "messages_job_id_fkey" FOREIGN KEY (job_id) REFERENCES xxxx(id) ON DELETE CASCADE
Publications:
    "analytics_poc_pub"
    "dbz_publication"
Triggers:
    xxxx_created_at BEFORE INSERT ON xxxx FOR EACH ROW EXECUTE FUNCTION set_created_at_if_blank()
    xxxx_updated_at BEFORE INSERT OR UPDATE ON xxxx FOR EACH ROW WHEN (current_setting('mothership.skip_xxxx_updated_at'::text, true) IS NULL OR current_setting('mothership.skip_xxxx_updated_at'::text, true) <> 'true'::text) EXECUTE FUNCTION set_updated_at()
Access method: heap

LOG:    (param:3) = UPDATE repack.table_20745 SET (id, title, job_template_id, job_template_version, job_tree, team_id, completed_at, reason_completed, completed_steps, total_steps, created_at, updated_at, is_preview, started_at, planned_start_at, creator_id, source_templates, metadata, mesh_enabled, recurring_rule, recurred_info, fix_stats, start_auto, job_type, using_events, src, priority, lookup_id, deviations, deviations_updated_at, archived_at, started_by, completed_by, location, attributes, planned_start_job_role_ids, referenced_fields, enable_step_tracking, completion_code, archived_by, canceled_at, canceled_by, expires_at, team_active, expires_after_seconds) = ($2.id, $2.title, $2.job_template_id, $2.job_template_version, $2.job_tree, $2.team_id, $2.completed_at, $2.reason_completed, $2.completed_steps, $2.total_steps, $2.created_at, $2.updated_at, $2.is_preview, $2.started_at, $2.planned_start_at, $2.creator_id, $2.source_templates, $2.metadata, $2.mesh_enabled, $2.recurring_rule, $2.recurred_info, $2.fix_stats, $2.start_auto, $2.job_type, $2.using_events, $2.src, $2.priority, $2.lookup_id, $2.deviations, $2.deviations_updated_at, $2.archived_at, $2.started_by, $2.completed_by, $2.location, $2.attributes, $2.planned_start_job_role_ids, $2.referenced_fields, $2.enable_step_tracking, $2.completion_code, $2.archived_by, $2.canceled_at, $2.canceled_by, $2.expires_at, $2.team_active, $2.expires_after_seconds) WHERE (id) = ($1.id)
LOG:    (param:4) = DELETE FROM repack.log_20745 WHERE id IN (
LOG:    (param:5) = 1000
ERROR: query failed: ERROR:  cannot update table "table_20745" because it does not have a replica identity and publishes updates
HINT:  To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.
CONTEXT:  SQL statement "UPDATE repack.table_20745 SET (id, title, job_template_id, job_template_version, job_tree, team_id, completed_at, reason_completed, completed_steps, total_steps, created_at, updated_at, is_preview, started_at, planned_start_at, creator_id, source_templates, metadata, mesh_enabled, recurring_rule, recurred_info, fix_stats, start_auto, job_type, using_events, src, priority, lookup_id, deviations, deviations_updated_at, archived_at, started_by, completed_by, location, attributes, planned_start_job_role_ids, referenced_fields, enable_step_tracking, completion_code, archived_by, canceled_at, canceled_by, expires_at, team_active, expires_after_seconds) = ($2.id, $2.title, $2.job_template_id, $2.job_template_version, $2.job_tree, $2.team_id, $2.completed_at, $2.reason_completed, $2.completed_steps, $2.total_steps, $2.created_at, $2.updated_at, $2.is_preview, $2.started_at, $2.planned_start_at, $2.creator_id, $2.source_templates, $2.metadata, $2.mesh_enabled, $2.recurring_rule, $2.recurred_info, $2.fix_stats, $2.start_auto, $2.job_type, $2.using_events, $2.src, $2.priority, $2.lookup_id, $2.deviations, $2.deviations_updated_at, $2.archived_at, $2.started_by, $2.completed_by, $2.location, $2.attributes, $2.planned_start_job_role_ids, $2.referenced_fields, $2.enable_step_tracking, $2.completion_code, $2.archived_by, $2.canceled_at, $2.canceled_by, $2.expires_at, $2.team_active, $2.expires_after_seconds) WHERE (id) = ($1.id)"
DETAIL: query was: SELECT repack.repack_apply($1, $2, $3, $4, $5, $6)
LOG: (query) SET search_path TO pg_catalog, pg_temp, public
LOG: (query) SET search_path TO pg_catalog, pg_temp, public
LOG: (query) SELECT repack.repack_drop($1, $2)
LOG:    (param:0) = 20745
LOG:    (param:1) = 4

rotten commented 1 year ago

I would expect pg_repack to break logical replication on a table that is being replicated because the table structure could change when it is rebuilt, for example the attnum might get updated if there had been a deleted column in the old table.

What surprises me, and I think is related to this ticket, is running pg_repack on random tables that are not configured with pg_logical replication (in pg 14.6) will trash the logical replication stream for those other tables. This doesn't happen consistently on any given table, but also randomly.

I've tried reducing the number of jobs, putting sleeps between repacking each table, changing the order that tables are repacked (always carefully avoiding the ones in the replication stream), and running in DEBUG to look for hints as to why the replication stream randomly gets trashed. ... No luck so far.

rotten commented 1 year ago

I did more experiments on my laptop instead of in AWS and logical replication does not break when pg_repack is running. It appears something in AWS is breaking replication when we run pg_repack. I have no idea what that might be. When logical replication is not configured, pg_repack runs fine.

jayelkaake commented 1 month ago

I ran into this same issue even though all my tables had primary keys and logical replication was disabled.

The problem ended up being that when we ran RDS migration service a while ago to move to this new PG 13 database, it left behind a publication entry on all tables.

For anyone who might be in the same boat, you can check if there are any publications by running select * from pg_publication; SQL, then.

If there are, check if there are any subscriptions by running select * from pg_subscription;.

If the subscriptions table is empty and you're not doing any kind of migrations, then you should be able to delete the publication entry with DROP PUBLICATION publication_name; where publication_name is the name of the publication you saw in `pg_publication in the first step.

After dropping the bogus publication I was able to run the pg_repack command successfully and the cannot update table "table_1234" because it does not have a replica identity and publishes updates error went away. Viola! 🎉

reorg / pg_repack

repack broken in the presence of an ALL TABLES publication set #235