zendesk / maxwell

Maxwell's daemon, a mysql-to-json kafka producer
https://maxwells-daemon.io/
Other
4.04k stars 1.01k forks source link

Maxwell not responding post disconnection #1753

Open pramess1 opened 3 years ago

pramess1 commented 3 years ago

Maxwell behaves weirdly at times. When MySql loads high for, roughly around 10 minutes, the CPU utilization goes to 100% and maxwell stops reading the binlogs. Now, when I try to restart, below are the logs I can see. Nothing works post the above scenario. I need to delete the maxwell database and create it once again to make it work.

Logs at the time of crash: (When CPU usage is 100%): 2021-09-28T20:33:26.633+05:30 15:03:26,631 INFO LogInputStreamReader - [2021-09-28 15:03:26.630957] [0x00000037][0x00007f571bee2700] [info] [processing_statistics_logger.cc:112] Stage 2 Triggers: { stream: 'maxwell-stream', manual: 0, count: 0, size: 0, matches: 0, timed: 0, KinesisRecords: 0, PutRecords: 0 } 2021-09-28T20:33:26.633+05:30 15:03:26,631 INFO LogInputStreamReader - [2021-09-28 15:03:26.630966] [0x00000037][0x00007f571bee2700] [info] [processing_statistics_logger.cc:127] (maxwell-stream) Average Processing Time: -nan ms 2021-09-28T20:33:41.631+05:30 15:03:41,631 INFO LogInputStreamReader - [2021-09-28 15:03:41.631057] [0x00000037][0x00007f571bee2700] [info] [processing_statistics_logger.cc:109] Stage 1 Triggers: { stream: 'maxwell-stream', manual: 0, count: 0, size: 0, matches: 0, timed: 0, UserRecords: 0, KinesisRecords: 0 } 2021-09-28T20:33:41.631+05:30 15:03:41,631 INFO LogInputStreamReader - [2021-09-28 15:03:41.631117] [0x00000037][0x00007f571bee2700] [info] [processing_statistics_logger.cc:112] Stage 2 Triggers: { stream: 'maxwell-stream', manual: 0, count: 0, size: 0, matches: 0, timed: 0, KinesisRecords: 0, PutRecords: 0 } 2021-09-28T20:33:41.631+05:30 15:03:41,631 INFO LogInputStreamReader - [2021-09-28 15:03:41.631128] [0x00000037][0x00007f571bee2700] [info] [processing_statistics_logger.cc:127] (maxwell-stream) Average Processing Time: -nan ms 2021-09-28T20:33:54.935+05:30 15:03:54,935 INFO BinaryLogClient - Trying to restore lost connection to {hostname}:3306 2021-09-28T20:33:56.631+05:30 15:03:56,631 INFO LogInputStreamReader - [2021-09-28 15:03:56.631222] [0x00000037][0x00007f571bee2700] [info] [processing_statistics_logger.cc:109] Stage 1 Triggers: { stream: 'maxwell-stream', manual: 0, count: 0, size: 0, matches: 0, timed: 0, UserRecords: 0, KinesisRecords: 0 } 2021-09-28T20:33:56.631+05:30 15:03:56,631 INFO LogInputStreamReader - [2021-09-28 15:03:56.631280] [0x00000037][0x00007f571bee2700] [info] [processing_statistics_logger.cc:112] Stage 2 Triggers: { stream: 'maxwell-stream', manual: 0, count: 0, size: 0, matches: 0, timed: 0, KinesisRecords: 0, PutRecords: 0 } 2021-09-28T20:33:56.631+05:30 15:03:56,631 INFO LogInputStreamReader - [2021-09-28 15:03:56.631292] [0x00000037][0x00007f571bee2700] [info] [processing_statistics_logger.cc:127] (maxwell-stream) Average Processing Time: -nan ms 2021-09-28T20:34:11.631+05:30 15:04:11,631 INFO LogInputStreamReader - [2021-09-28 15:04:11.631388] [0x00000037][0x00007f571bee2700] [info] [processing_statistics_logger.cc:109] Stage 1 Triggers: { stream: 'maxwell-stream', manual: 0, count: 0, size: 0, matches: 0, timed: 0, UserRecords: 0, KinesisRecords: 0 } 2021-09-28T20:34:11.631+05:30 15:04:11,631 INFO LogInputStreamReader - [2021-09-28 15:04:11.631447] [0x00000037][0x00007f571bee2700] [info] [processing_statistics_logger.cc:112] Stage 2 Triggers: { stream: 'maxwell-stream', manual: 0, count: 0, size: 0, matches: 0, timed: 0, KinesisRecords: 0, PutRecords: 0 }

Logs when trying to restart maxwell: Using kafka version: 1.0.0 15:37:31,869 WARN MaxwellMetrics - Metrics will not be exposed: metricsReportingType not configured. 15:37:33,438 INFO KinesisProducerConfiguration - Attempting to load config from file kinesis-producer-library.properties 15:37:33,533 WARN KinesisProducerConfiguration - Property Port ignored as there is no corresponding set method in KinesisProducerConfiguration 15:37:33,551 INFO KinesisProducer - Extracting binaries to /tmp/amazon-kinesis-producer-native-binaries 15:37:34,784 INFO Maxwell - Maxwell v1.26.2 is booting (MaxwellKinesisProducer), starting at Position[BinlogPosition[mysql-bin-changelog.507860:21033693], lastHeartbeat=1632841035939] 15:37:34,792 INFO Daemon - Asking for trace 15:37:35,069 INFO LogInputStreamReader - [2021-09-28 15:37:35.052463] [0x00000038][0x00007fc6b622f7c0] [info] [logging.cc:83] Set boost log level to info 15:37:35,069 INFO LogInputStreamReader - [2021-09-28 15:37:35.052577] [0x00000038][0x00007fc6b622f7c0] [info] [logging.cc:170] Set AWS Log Level to WARN 15:37:35,069 INFO LogInputStreamReader - [2021-09-28 15:37:35.057768] [0x00000038][0x00007fc6b622f7c0] [info] [main.cc:346] Setting CA path to /tmp/amazon-kinesis-producer-native-binaries/cacerts 15:37:35,070 INFO LogInputStreamReader - [2021-09-28 15:37:35.057792] [0x00000038][0x00007fc6b622f7c0] [info] [main.cc:382] Starting up main producer 15:37:35,070 INFO LogInputStreamReader - [2021-09-28 15:37:35.057819] [0x00000038][0x00007fc6b622f7c0] [info] [kinesis_producer.cc:101] Using Region: default 15:37:35,070 INFO LogInputStreamReader - [2021-09-28 15:37:35.057832] [0x00000038][0x00007fc6b622f7c0] [info] [kinesis_producer.cc:120] Using per request threading model. 15:37:35,073 INFO LogInputStreamReader - [2021-09-28 15:37:35.057843] [0x00000038][0x00007fc6b622f7c0] [info] [kinesis_producer.cc:52] Using default Kinesis endpoint 15:37:35,073 INFO LogInputStreamReader - [2021-09-28 15:37:35.057883] [0x00000038][0x00007fc6b622f7c0] [info] [kinesis_producer.cc:101] Using Region: default 15:37:35,074 INFO LogInputStreamReader - [2021-09-28 15:37:35.057895] [0x00000038][0x00007fc6b622f7c0] [info] [kinesis_producer.cc:120] Using per request threading model. 15:37:35,074 INFO LogInputStreamReader - [2021-09-28 15:37:35.057905] [0x00000038][0x00007fc6b622f7c0] [info] [kinesis_producer.cc:52] Using default CloudWatch endpoint 15:37:35,074 INFO LogInputStreamReader - [2021-09-28 15:37:35.057982] [0x00000038][0x00007fc6b622f7c0] [info] [main.cc:393] Entering join 15:37:35,343 INFO MysqlSavedSchema - Restoring schema id 1 (last modified at Position[BinlogPosition[mysql-bin-changelog.507420:9579750], lastHeartbeat=0]) 15:37:35,861 INFO BinlogConnectorReplicator - Setting initial binlog pos to: mysql-bin-changelog.507860:21033693 15:37:35,930 INFO BinaryLogClient - Connected to {host}:3306 at mysql-bin-changelog.507860/21033693 (sid:6379, cid:2474106302) 15:37:35,930 INFO BinlogConnectorLifecycleListener - Binlog connected. 15:43:35,943 INFO BinaryLogClient - Trying to restore lost connection to {host}:3306

osheroff commented 3 years ago

can you dump some binlog events starting at mysql-bin-changelog.507860:21033693? you can use show binlog events in 'binlog.000074' from [POS] limit [LIMIT]

if you can reproduce this reliably, can you capture a stack trace from the stuck process with jstack?

pramess1 commented 3 years ago

Sorry, unable to dump binloag events, since the retention period is only 4 hrs, which is already over. Also, this happened in prod env, unable to reproduce in local env.

osheroff commented 3 years ago

all of that is unfortunate. could be an out-of-memory scenario but it's really hard to tell without a reliable reproduction.

Average Processing Time: -nan ms seems like an odd log message.

osheroff commented 3 years ago

https://github.com/awslabs/amazon-kinesis-producer/issues/183

pramess1 commented 2 years ago

Finally figured the problem in prod env, if a record size exceeds 1 mb, then it crashes and leaves this odd log message. Reducing the row size to less than 1 mb, resolves this issue.