zalando / nakadi

A distributed event bus that implements a RESTful API abstraction on top of Kafka-like queues
https://nakadi.io
MIT License
952 stars 292 forks source link

Relax Kafka producer reset cases #1530

Closed a1exsh closed 1 year ago

a1exsh commented 1 year ago

Neither NotLeaderOrFollower nor Timeout is a good reason for resetting the producer.

Jan-M commented 1 year ago

:+1:

1u0 commented 1 year ago

:+1:

adyach commented 1 year ago

I agree those exceptions should not recreate producer, but it is not clear how you decided on what should be left. For example NotLeaderForPartitionException, NetworkException do not have to trigger producer recreation either. Those are transient errors, producer will refresh metadata and the issue will be resolved. The left exception is only `UnknownServerException. The name suggest (and code as well) that it is not producer side error which means there is no reason to recreate the producer, it simply won't help, because the error is on the server side.

I do not completely remember why the list of the exceptions is like that, that's why I suggest to remove that section completely and validate that it works as expected.

a1exsh commented 1 year ago

Eventually we may remove it completely, as I also don't see any specific reason for closing existing producer. The documentation mentions one or two cases where this may be necessary, when working with transactions or using idempotent producer, which is not the case for us.

At the same time, I suppose the special handling there for a reason, and for now I have removed the ones that I could test locally and have seen that the producer can be used further after simply catching one of them.

a1exsh commented 1 year ago

:+1: