zettadb / kunlun-storage-8.0.26

Other
0 stars 0 forks source link

Heartbeat event log_pos may overflow and causes IO thread to stop #9

Open jd-zhang opened 2 years ago

jd-zhang commented 2 years ago

Issue migrated from trac ticket # 736

component: kunlun-storage | priority: major

2022-05-25 16:01:14: zhaowei@zettadb.com created the issue


heartbeat binlog event sends a 32-bit log pos(offset) which overflows if the log file grows over 4GB in rare cases, causing replica to stop IO thread with error:

2022-05-25T11:26:59.338735+08:00 85 [ERROR] [MY-013118] [Repl] Slave I/O for channel '': Unexpected master's heartbeat data: heartbeat is not compatible with local info; the event's data: log_file_name binlog.000176 log_pos 1197659012, Error_code: MY-013118 2022-05-25T11:26:59.338754+08:00 85 [ERROR] [MY-013122] [Repl] Slave I/O for channel '': Relay log write failure: could not queue event from master, Error_code: MY-013122 .

jd-zhang commented 2 years ago

2022-05-25 16:06:09: zhaowei@zettadb.com

jd-zhang commented 2 years ago

2022-05-25 16:06:09: zhaowei@zettadb.com commented


The binlog header's log_pos is a uint32 and we try to minimize the event format change's impacts, so append the complete(untruncated) log_pos at end of the heartbeat event and use it for heartbeat event verifications in queue_event().

jd-zhang commented 2 years ago

2022-06-09 09:51:20: zhaowei@zettadb.com commented


How to reproduce:

write to binlog a txn of over 4GB binlog bytes, e.g.

create table t1(a serial primary key, b text); 
insert into t1 (b) values(repeat('a',10000000000));