uber / uReplicator

Improvement of Apache Kafka Mirrormaker
Apache License 2.0
916 stars 198 forks source link

Error when sending message to topic with key: null #253

Open bprise-dev opened 5 years ago

bprise-dev commented 5 years ago

We are facing issues while replicating kafka data for 1 topic from 1 datacenter to other with 6 million records. Replication is working fine after starting for a couple of minutes then the worker process fails with the following error.

worker logs are as below: [2019-09-11 06:04:14,137] INFO Closing producer due to send failure. (kafka.mirrormaker.WorkerInstance:66) [2019-09-11 06:04:14,137] INFO [Producer clientId=ureplicator] Closing the Kafka producer with timeoutMillis = 0 ms. (org.apache.kafka.clients.producer.KafkaProducer:1054) [2019-09-11 06:04:14,137] INFO [Producer clientId=ureplicator] Proceeding to force close the producer since pending requests could not be completed within timeout 0 ms. (org.apache.kafka.clients.producer.KafkaProducer:1078) [2019-09-11 06:04:14,137] ERROR Error when sending message to topic requestlog with key: null, value: 1279 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback:52) java.lang.IllegalStateException: Producer is closed forcefully. at org.apache.kafka.clients.producer.internals.RecordAccumulator.abortBatches(RecordAccumulator.java:696) at org.apache.kafka.clients.producer.internals.RecordAccumulator.abortIncompleteBatches(RecordAccumulator.java:683) at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:185) at java.lang.Thread.run(Thread.java:748)


Producer configs are as follows:

compression.type=gzip batch.size=4000 linger.ms=1000 buffer.memory=167772160 max.request.size=31457280 send.buffer.bytes=62914560 max.in.flight.requests.per.connection=50 delivery.timeout.ms=600000 request.timeout.ms=120000

xhl1988 commented 5 years ago

Can you find the first ERROR message? That message will indicate what caused the producer to be closed. The error message above is the expected when producer is already closed.

bprise-dev commented 5 years ago

The below expired records logs repetitively & then the producer goes down.

[2019-09-11 05:27:06,671] INFO [Producer clientId=ureplicator] Closing the Kafka producer with timeoutMillis = 0 ms. (org.apache.kafka.clients. producer.KafkaProducer:1054) [2019-09-11 05:27:06,671] INFO [Producer clientId=ureplicator] Proceeding to force close the producer since pending requests could not be compl eted within timeout 0 ms. (org.apache.kafka.clients.producer.KafkaProducer:1078) [2019-09-11 05:27:06,671] ERROR Error when sending message to topic requestlog with key: null, value: 1218 bytes with error: (org.apache.kafka. clients.producer.internals.ErrorLoggingCallback:52) org.apache.kafka.common.errors.TimeoutException: Expiring 3 record(s) for requestlog-3: 120001 ms has passed since last append [2019-09-11 05:27:06,671] INFO Closing producer due to send failure. (kafka.mirrormaker.WorkerInstance:66)

xhl1988 commented 5 years ago

From this message it shows the request timeout. It either means broker has high latency or is unavailable; Or do you see any other error before this? (if possible, can you provide full log?) Which branch do you use now?

@yangy0000 I recall that we also saw timeout before but caused by other errors first?

bprise-dev commented 5 years ago

The logs i have posted are the only errors in logs. Moreover, we have a high latency between the clusters. I increased the request.timeout to a higher value & worker is running smoothly for now. Is this possibly the reason we faced errors earlier. We have high throughput source kafka cluster with latency a bit high to dest kafka cluster.

xhl1988 commented 5 years ago

I guess latency is the cause. However, 2min seems not reasonable for latency.

How high is the latency between the clusters?

Try tune the producer configs to reduce the latency, e.g., increase batch size, linger time, etc, also maybe reduce the max.in.flight.connection as 50 might be too large.