shotover / shotover-proxy

L7 data-layer proxy
https://docs.shotover.io
Apache License 2.0
83 stars 16 forks source link

OUT_OF_ORDER_SEQUENCE_NUMBER errors when producing to PSC cluster #1739

Closed yuzhouchen-instaclustr closed 2 days ago

yuzhouchen-instaclustr commented 2 weeks ago

Describe the bug

Using a Java client that produces messages to a topic via PSC endpoints and Shotover yield the following error:

[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.producer.internals.Sender - [Producer clientId=producer-1] Got error produce response with correlation id 9 on topic-partition small-topic-1, retrying (2147483646 attempts left). Error: NOT_LEADER_OR_FOLLOWER

[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.producer.internals.Sender - [Producer clientId=producer-1] Received invalid metadata error in produce request on partition small-topic-1 due to org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests intended only for the leader, this error indicates that the broker is not the current leader. For requests intended for any replica, this error indicates that the broker is not a replica of the topic partition.. Going to request metadata update now

[kafka-producer-network-thread | producer-1] WARN org.apache.kafka.clients.producer.internals.Sender - [Producer clientId=producer-1] Got error produce response with correlation id 11 on topic-partition small-topic-0, retrying (2147483645 attempts left). Error: OUT_OF_ORDER_SEQUENCE_NUMBER

Causing message production to fail. This problem appears randomly, the reproduction steps below do not guarantee that you would observe the above behaviour.

To Reproduce

Steps to reproduce the behavior:

  1. Set up a PSC cluster, I was using SASL_PLAINTEXT for authentication between client and Shotover, not sure if that is relevant
  2. Use a Java client to produce messages with key and value to a topic

Expected behaviour

Messages being successfully produced

Systems and Version:

rukai commented 2 days ago

I was not able to reproduce this. I believe this was encountered on an old version of shotover and it has been fixed in the latest main branch.