Open Shukla-Ankur opened 1 year ago
changing the column name to sample_column
isn't helping things -- in the binlog row you posted, which column corresponds to sample_column
?
Does the row in maxwell
.schemas
row still exist?
Manually messing around with maxwell's position can lead to schema inconsistencies, if for instance you fail to process the change from one column type to another. I don't think that's what happened here.
When maxwell restarted did it hit the error condition again? I'm trying to figure out if the problem is with the schema data in mysql or in-memory.
Oh, that's my bad. It is the tenant_id column. I have removed 'sample_column' name.
ok. So I still have the same questions:
sample_table
since 11/8?It's very odd to me that maxwell processed the schema entry correctly and wrote it out to disk but somehow the in-memory version had an incorrect schema...
do you have more maxwell logs from the month?
The only event that should cause a schema to change is processing an ALTER or other DDL statement. was there any indication of that happening around the time of the row update?
what was the binlog position of the row that caused the error? Can you dump that binlog event? Is there anything special or unusual about that row compared to other rows in the same table?
BTW if this happens again please don't just DROP DATABASE maxwell
-- rename it out of the way so you have a copy of the schema at the time -- it'd help debug this problem a lot.
let me gather and share all the maxwell logs I have. I still have the .ibd files for the DB. I'm not sure, but I'll try to restore it on another mysql instance. And yep, right mate, should have taken a backup at least or renamed it simply.
it's like maxwell reset to an older version of the schema. could the binlog numbers have wrapped around or something crazy like that? At what position did the crash happen?
#221207 5:19:48 server id 4 end_log_pos 172365598 Query thread_id=667853 exec_time=0 error_code=0
SET TIMESTAMP=1670390388/*!*/;
BEGIN
/*!*/;
# at 172365598
#221207 5:19:48 server id 4 end_log_pos 172365671 Table_map: `maxwell`.`positions` mapped to number 203
# at 172365671
#221207 5:19:48 server id 4 end_log_pos 172365945 Update_rows: table id 203 flags: STMT_END_F
### UPDATE `maxwell`.`positions`
### WHERE
### @1=0
### @2='binlog.000007'
### @3=172364491
### @4='a772f477-e01f-4d39-b8b5-287d50885cf6:1-1786504'
### @5='prod-cdc-rs-pageactivity-widgetlibrary-1'
### @6=NULL
### @7=1670390377818
### SET
### @1=0
### @2='binlog.000007'
### @3=172365415
### @4='a772f477-e01f-4d39-b8b5-287d50885cf6:1-1786506'
### @5='prod-cdc-rs-pageactivity-widgetlibrary-1'
### @6=NULL
### @7=1670390387825
# at 172365945
#221207 5:19:48 server id 4 end_log_pos 172365972 Xid = 98718999
COMMIT/*!*/;
# at 172365972
#221207 5:19:57 server id 4 end_log_pos 172366054 GTID last_committed=132262 sequence_number=132263 rbr_only=yes original_committed_timestamp=1670390397814432 immediate_commit_timestamp=1670390397815582 transaction_length=394
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
# original_commit_timestamp=1670390397814432 (2022-12-07 05:19:57.814432 UTC)
# immediate_commit_timestamp=1670390397815582 (2022-12-07 05:19:57.815582 UTC)
/*!80001 SET @@session.original_commit_timestamp=1670390397814432*//*!*/;
/*!80014 SET @@session.original_server_version=80023*//*!*/;
/*!80014 SET @@session.immediate_server_version=80023*//*!*/;
SET @@SESSION.GTID_NEXT= 'a772f477-e01f-4d39-b8b5-287d50885cf6:1786509'/*!*/;
# at 172366054
#221207 5:19:57 server id 4 end_log_pos 172366137 Query thread_id=667853 exec_time=0 error_code=0
SET TIMESTAMP=1670390397/*!*/;
BEGIN
/*!*/;
# at 172366137
#221207 5:19:57 server id 4 end_log_pos 172366199 Table_map: `maxwell`.`heartbeats` mapped to number 208
# at 172366199
#221207 5:19:57 server id 4 end_log_pos 172366339 Update_rows: table id 208 flags: STMT_END_F
### UPDATE `maxwell`.`heartbeats`
### WHERE
### @1=0
### @2='prod-cdc-rs-pageactivity-widgetlibrary-1'
### @3=1670390387825
### SET
### @1=0
### @2='prod-cdc-rs-pageactivity-widgetlibrary-1'
### @3=1670390397834
# at 172366339
do you have binlogs for the update/insert maxwell actually crashed on? the one with tenant_id
172366339 is the last binlog position recoded by maxwell while in error state. Following is the latest binlog entry for the same table :
`# at 172366978
SET TIMESTAMP=1670390407/!/; BEGIN /!/;
The column of concern is 4
ok, now can you find a previous mention of sample_table
in the same binlog file? i have a theory (not a great one) that it's something about the table_id
or the table cache...
This is the last change for table I could find before error:
`
COMMIT/!/; `
@osheroff this happened again. This time, this happened for multiple tables where we had updated the column type. Weird thing is that the alters were run on 9th Dec. We continued to publish CDC correctly after that till today. I could see updated entry in the schemas table of maxwell. I have kept the maxwell DB this time. Let me know what data you need for this.
let's see what's in maxwell.columns
for the table(s) in questions, as well as everything in maxwell.schemas
.
Wondering aloud -- could schema compaction trigger this? Anything in the maxwell logs about "schema compaction"?
Latest error recorded
6335 [ERROR] BinlogConnectorReplicator: Unable to cast 1592752282412193923 (java.lang.Long) into column sample_db.module.tenant_id (type 'varchar') 6379 [ERROR] Maxwell: checking for schema inconsistencies in sample_db.module 6416 [ERROR] Maxwell: sample_db.module.tenant_id has a type mismatch, varchar vs bigint in new 6416 [ERROR] Maxwell: sample_db.module.tenant_id has a type mismatch, bigint vs varchar in old
For the DB in latest corruption, I see 2 entries in maxwell.databases table
| 13 | 1 | sample_db | utf8mb4
| 37 | 1465 | sample_db | utf8mb4
entry in the maxwell.tables for the DB table for which maxwell failed
| 519 | 1465 | 36 | module
select * from columns where table_id=519\G
*************************** 1. row *************************** id: 7522 schema_id: 1465 table_id: 519 name: f_cdc charset: NULL coltype: bit is_signed: 0 enum_values: NULL column_length: NULL *************************** 2. row *************************** id: 7523 schema_id: 1465 table_id: 519 name: company_id charset: NULL coltype: bigint is_signed: 1 enum_values: NULL column_length: NULL *************************** 3. row *************************** id: 7524 schema_id: 1465 table_id: 519 name: module_id charset: NULL coltype: bigint is_signed: 1 enum_values: NULL column_length: NULL *************************** 4. row *************************** id: 7525 schema_id: 1465 table_id: 519 name: name charset: utf8mb4 coltype: varchar is_signed: 0 enum_values: NULL column_length: NULL *************************** 5. row *************************** id: 7526 schema_id: 1465 table_id: 519 name: module_version charset: NULL coltype: bigint is_signed: 1 enum_values: NULL column_length: NULL *************************** 6. row *************************** id: 7527 schema_id: 1465 table_id: 519 name: max_score charset: NULL coltype: bigint is_signed: 1 enum_values: NULL column_length: NULL *************************** 7. row *************************** id: 7528 schema_id: 1465 table_id: 519 name: state charset: NULL coltype: enum is_signed: 0 enum_values: ["COPYING_FILES"] column_length: NULL *************************** 8. row *************************** id: 7529 schema_id: 1465 table_id: 519 name: created_by charset: NULL coltype: bigint is_signed: 1 enum_values: NULL column_length: NULL *************************** 9. row *************************** id: 7530 schema_id: 1465 table_id: 519 name: module_type charset: NULL coltype: enum is_signed: 0 enum_values: ["TEST"] column_length: NULL *************************** 10. row *************************** id: 7531 schema_id: 1465 table_id: 519 name: id charset: utf8mb4 coltype: varchar is_signed: 0 enum_values: NULL column_length: NULL *************************** 11. row *************************** id: 7532 schema_id: 1465 table_id: 519 name: created_time charset: NULL coltype: bigint is_signed: 1 enum_values: NULL column_length: NULL *************************** 12. row *************************** id: 7533 schema_id: 1465 table_id: 519 name: updated_time charset: NULL coltype: bigint is_signed: 1 enum_values: NULL column_length: NULL *************************** 13. row *************************** id: 7534 schema_id: 1465 table_id: 519 name: tenant_id charset: utf8mb4 coltype: varchar is_signed: 0 enum_values: NULL column_length: NULL *************************** 14. row *************************** id: 7535 schema_id: 1465 table_id: 519 name: auth_id charset: utf8mb4 coltype: varchar is_signed: 0 enum_values: NULL column_length: NULL *************************** 15. row *************************** id: 7536 schema_id: 1465 table_id: 519 name: global_context_id charset: utf8mb4 coltype: varchar is_signed: 0 enum_values: NULL column_length: NULL *************************** 16. row *************************** id: 7537 schema_id: 1465 table_id: 519 name: version charset: NULL coltype: bigint is_signed: 1 enum_values: NULL column_length: NULL 16 rows in set (0.00 sec)
As can be seen here, as per maxwell, the datatype is varchar.
in maxwell.schemas, there are 26 entries for same table, all alters, but all have same binlog and position. So below is the entry with latest heartbeat. this shows datatype as bigint ` select * from maxwell.schemas | id | binlog_file | binlog_position | last_heartbeat_read | gtid_set | base_schema_id | deltas | server_id | position_sha | charset | version | deleted |
3266 | binlog.000129 | 89853139 | 1670582038626 | 06536c13-9fea-11ea-b1ec-0aea5c2a1d50:1-94880658,22d98918-6ad4-4f69-bdbf-1a01018f6550:1-192810486,23e289a8-1032-4d18-ae3b-774065c72da2:1-8133103,5841c312-0b74-11ea-949e-0240b5d03dd2:1-16003382,ff85281a-3a7d-11ed-8647-0a12595b8fce:1-2468 | 3254 | [{"type":"table-alter","database":"sample_db","table":"module","old":{"database":"sample_db","charset":"utf8mb4","table":"module","primary-key":["tenant_id","company_id","module_id"],"columns":[{"type":"varchar","name":"id","charset":"utf8mb4"},{"type":"bigint","name":"created_time","signed":true},{"type":"bigint","name":"updated_time","signed":true},{"type":"varchar","name":"tenant_id","charset":"utf8mb4"},{"type":"varchar","name":"global_context_id","charset":"utf8mb4"},{"type":"varchar","name":"auth_id","charset":"utf8mb4"},{"type":"bigint","name":"version","signed":true},{"type":"bit","name":"f_cdc"},{"type":"bigint","name":"company_id","signed":true},{"type":"bigint","name":"module_id","signed":true},{"type":"varchar","name":"name","charset":"utf8mb4"},{"type":"bigint","name":"module_version","signed":true},{"type":"bigint","name":"max_score","signed":true},{"type":"enum","name":"state","enum-values":["TEST"]},{"type":"bigint","name":"created_by","signed":true},{"type":"enum","name":"module_type","enum-values":["TEST"]}]},"def":{"database":"sample_db","charset":"utf8mb4","table":"module","primary-key":["tenant_id","company_id","module_id"],"columns":[{"type":"varchar","name":"id","charset":"utf8mb4"},{"type":"bigint","name":"created_time","signed":true},{"type":"bigint","name":"updated_time","signed":true},{"type":"bigint","name":"tenant_id","signed":true},{"type":"varchar","name":"global_context_id","charset":"utf8mb4"},{"type":"varchar","name":"auth_id","charset":"utf8mb4"},{"type":"bigint","name":"version","signed":true},{"type":"bit","name":"f_cdc"},{"type":"bigint","name":"company_id","signed":true},{"type":"bigint","name":"module_id","signed":true},{"type":"varchar","name":"name","charset":"utf8mb4"},{"type":"bigint","name":"module_version","signed":true},{"type":"bigint","name":"max_score","signed":true},{"type":"enum","name":"state","enum-values":["COPYING_FILES","COPYING_DB","DRAFT","PUBLISHED","ARCHIVED","DELETED","COPY_FAILED"]},{"type":"bigint","name":"created_by","signed":true},{"type":"enum","name":"module_type","enum-values":["TEST"]}]}}] | 2 | c66fea01ef29d6b2670bc17f7528fc30272992cd | utf8mb4 | 4 | 0 | `
@osheroff ^
@osheroff any suggestions here?
sorry been busy @ day job.
I can't see anything from what you sent. I'm wondering if you could mysqldump
your "maxwell" database and send it to me -- I'm ben dot osheroff at gmail dot com if you want to send me the file. Can provide a pgp key if you like.
will share after sanitizing the data
this issue is both for gathering information and also to raise the issues seen in a production environment. MySQL v8.0.23 Maxwell v1.33.0 I witnessed following data corruption in Maxwell :
Maxwell stopped with following error