zendesk / maxwell

Maxwell's daemon, a mysql-to-json kafka producer
https://maxwells-daemon.io/
Other
4.04k stars 1.01k forks source link

Data loss #1443

Open dbhatia2 opened 4 years ago

dbhatia2 commented 4 years ago

Hello Experts for making a great product. We are observing data is not produced by maxwell in kafka for MySQL database. we need at least once delivery. Here is the details.

Maxwell Version: maxwell-1.22.3 Maxwell configs in place: kafka.acks = all kafka.retries = 5 ignore_producer_error=false

Note: min.insync.replicas is not set assuming message would fail if any of the replica is not available. topic config: Topic:t opic1 PartitionCount:97 ReplicationFactor:3 Configs:compression.type=uncompressed,segment.bytes=1073741824,retention.ms=604800000,retention.bytes=-1 Topic: topic1 Partition: 0 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0 Topic: topic1 Partition: 1 Leader: 2 Replicas: 2,0,1 Isr: 2,0,1 Topic: topic1 Partition: 2 Leader: 0 Replicas: 0,1,2 Isr: 0,1,2 Topic: topic1 Partition: 96 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0

`CreatedDate ID
2020-03-13 03:25:17 2847445
2020-03-13 03:25:31 2847446 **** missing ID
2020-03-13 03:26:00 2847447 `

Data is available in binary log: 691045746-#200313 3:25:17 server id 10178193 end_log_pos 264264662 CRC32 0x6137aca4 Write_rows: table id 519 flags: STMT_END_F 691045866-### INSERT INTOschema1.table1 691045907-### SET 691045915:### @1=2847445 /* INT meta=0 nullable=0 is_null=0 */ 691045970-### @2=331 /* INT meta=0 nullable=0 is_null=0 */ 691046021-### @3=NULL /* INT meta=0 nullable=1 is_null=1 */ 691046073-### @4=595 /* INT meta=0 nullable=1 is_null=0 */ .. ** Missing in kafka**** 693742176-#200313 3:25:30 server id 10178193 end_log_pos 264950076 CRC32 0x4f03bf82 Write_rows: table id 519 flags: STMT_END_F 693742296-### INSERT INTOschema1.table1 693742337-### SET 693742345:### @1=2847446 /* INT meta=0 nullable=0 is_null=0 */ 693742400-### @2=16 /* INT meta=0 nullable=0 is_null=0 */ 693742450-### @3=43401 /* INT meta=0 nullable=1 is_null=0 */ 693742503-### @4=184 /* INT meta=0 nullable=1 is_null=0 */

... 699292938-#200313 3:25:59 server id 10178193 end_log_pos 266290032 CRC32 0x02abffc9 Write_rows: table id 519 flags: STMT_END_F 699293058-### INSERT INTOschema1.table1 699293099-### SET 699293107:### @1=2847447 /* INT meta=0 nullable=0 is_null=0 */ 699293162-### @2=87 /* INT meta=0 nullable=0 is_null=0 */ 699293212-### @3=NULL /* INT meta=0 nullable=1 is_null=1 */ 699293264-### @4=68 /* INT meta=0 nullable=1 is_null=0 */ ...

Note we have implemented this change to exclude XA tranaction: https://github.com/zendesk/maxwell/issues/838

other observation:

dbhatia2 commented 4 years ago

@osheroff Can you please guide on this . Do we capture kafka ack response like kafka offset/kafka partition from kafka ack

osheroff commented 4 years ago

This could be related to #838. It could also be something else. Let's do a little triage here:

  1. If you look more in the binlog, was this part of an XA transaction? If so, were they all in the same XA transaction?
  2. Would the rows have been routed to the same partition? Was there any kafka leadership-change or other problems on that partition at the time of the data loss?
  3. Has this repro'ed again?
dbhatia2 commented 4 years ago

Thanks @osheroff

  1. No they are not part of XA transactions. and This is from binary logs.

    . ** Missing in kafka**** 
    693742176-#200313 3:25:30 server id 10178193 end_log_pos 264950076 CRC32 0x4f03bf82 Write_rows: table id 519 flags: STMT_END_F 693742296-### INSERT INTO schema1.table1693742337-### SET 693742345:### @1=**2847446** /* INT meta=0 nullable=0 is_null=0 */ 693742400-### @2=16 /* INT meta=0 nullable=0 is_null=0 */ 693742450-### @3=43401 /* INT meta=0 nullable=1 is_null=0 */ 693742503-### @4=184 /* INT meta=0 nullable=1 is_null=0 */
  2. The row before and after this row was routed to same partition i.e. partition 94. This above row didn't reach to kafka.

3.I have updated my maxwell with https://github.com/zhangdove/maxwell repo (#1358) as this not skipping the XA transactions vs #838(patch is skipping it). I will be keeping an close eye on system if we get similar issues again.

osheroff commented 4 years ago

were there any kafka leadership changes or interesting things in the kafka logs around the time of the data loss?

Bruce2jiang commented 4 years ago

@dbhatia2 Hello, has this problem been solved