Open shrutishrm512 opened 1 year ago
- This revet migration got completed on both the shards, but still showing migration_status as running as complete command did not get fired
@shrutishrm512 can you please clarify, when you say the migration got completed -- was the table schema for table participants
revert to before having the new code
column?
7.It is not even retrying again to rename the table, still showing same error message
It looks to me like the revert
migration was not completed. What is the table schema at this time?
9. This revet migration got completed on both the shards, but still showing migration_status as running as complete command did not get fired
@shrutishrm512 can you please clarify, when you say the migration got completed -- was the table schema for table
participants
revert to before having the newcode
column?7.It is not even retrying again to rename the table, still showing same error message
It looks to me like the
revert
migration was not completed. What is the table schema at this time?
@shlomi-noach It was showing progress as 100 and read_to_complete as 1, but migration_status is running. Table schema was not reverted to old one. Yes, it is not completed on that shard, got failed while renaming the table. Table Schema was at that time
CREATE TABLE `participants` (
`id` bigint NOT NULL,
`name` varchar(100) DEFAULT NULL,
`user_id` bigint DEFAULT NULL,
`address_type` varchar(50) DEFAULT NULL,
`created_on` timestamp(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
`updated_on` timestamp(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3),
`code` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_participants_created_on` (`created_on`),
KEY `idx_participants_updated_on` (`updated_on`),
KEY `idx_user_id_created_on` (`user_id`,`created_on`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
OK, thank you, this is less alarming to me now. So both migrations are incomplete, still running, but repeatedly unable to cut-over. Sorry I didn't ask before -- but can you look at the vttablet
logs for the PRIMARY
(either one of the shards where it fails) and look for the entries around that failure please?
OK, thank you, this is less alarming to me now. So both migrations are incomplete, still running, but repeatedly unable to cut-over. Sorry I didn't ask before -- but can you look at the
vttablet
logs for thePRIMARY
(either one of the shards where it fails) and look for the entries around that failure please?
Primary vttablet logs are present in log fragments.
Not sure how do you mean "present in log fragments"?
If logs are available from around the cut-over time where the RENAME
fails, can you please paste them?
Primary vttablet logs are present in log fragments.
Oh you mean in the section ":Log Fragments" in the original comment. Thank you. I'm looking for a bit more around that time, if there's anything at all? Also, I presume these logs are recurring? I expect the error to reappear every minute or so in the logs?
Primary vttablet logs are present in log fragments.
Oh you mean in the section ":Log Fragments" in the original comment. Thank you. I'm looking for a bit more around that time, if there's anything at all? Also, I presume these logs are recurring? I expect the error to reappear every minute or so in the logs?
Actually, from logs it looks like it had made only two attempts. There were no logs after this timestamp"15:49:05.472031" Below logs were only visible related to the migration.
E0502 15:49:05.469918 4614 dbconn.go:436] Could not kill query ID 21 : Unknown thread id: 21 (errno 1094) (sqlstate HY000) during query: kill 21
E0502 15:49:05.472017 4614 executor.go:3630] cutOverVReplMigration failed: err=Code: ABORTED
timeout for rename query: RENAME TABLE `txn_participants` TO `_vt_HOLD_bef655f6e8d211eda1e90292d32412fc_20230503101856`, `_efb23f12_e822_11ed_bb83_0a629da756d6_20230501185027_vrepl` TO `participants`, `_vt_HOLD_bef655f6e8d211eda1e90292d32412fc_20230503101856` TO `_efb23f12_e822_11ed_bb83_0a629da756d6_20230501185027_vrepl`
E0502 15:49:05.472031 4614 executor.go:3957] Code: ABORTED
timeout for rename query: RENAME TABLE `participants` TO `_vt_HOLD_bef655f6e8d211eda1e90292d32412fc_20230503101856`, `_efb23f12_e822_11ed_bb83_0a629da756d6_202client_loop: send disconnect: Broken pipe `_vt_HOLD_bef655f6e8d211eda1e90292d32412fc_20230503101856` TO `_efb23f12_e822_11ed_bb83_0a629da756d6_2023050118'
OK, thank you. I'm going to need more time to reproduce this. The main question is not why the RENAME
timed out, but rather why wasn't there a retry.
Actually, the RENAME
apparently never took place. The timeout is with waiting for the RENAME
to appear in processlist
.
To reproduce the issue I'll just run an early return before even waiting for processlist
, and see whether there's no cut-over retries, and if so, why.
I'm unable to reproduce. In my experiments, the Online DDL executor correctly retries the migration even if the RENAME
statement times out. I am not sure why in the original issue's scenario the migration was not retried.
I'm unable to reproduce. In my experiments, the Online DDL executor correctly retries the migration even if the
RENAME
statement times out. I am not sure why in the original issue's scenario the migration was not retried.
Let me try reproduce this issue agin, will post it here again.
Overview of the Issue
I am facing RENAME table issue while running Online DDl with "vitess" strategy with postpone completion option (Testing Postponed migrations : https://vitess.io/docs/16.0/user-guides/schema-changes/postponed-migrations/)
Alter is not getting completed on all the shards while doing cutover.
Reproduction Steps
As checked vtHold table was present in the shard, still it is not able to rename table and vttablet logs not showing proper reason for cut over to get failed
7.It is not even retrying again to rename the table, still showing same error message
I have executed same rename table query directly on the shard, it got executed. But still migration is in running state
Operating System and Environment details
Log Fragments