ntent / kafka4net

C# client for Kafka
Apache License 2.0
52 stars 32 forks source link

Redesign error processing #5

Open vchekan opened 9 years ago

vchekan commented 9 years ago

Currently error processing is focused around kafka broker recovery. Connecting to broker, fetching offsets is not reliable and failures are not handled properly, leading to occasional random behavior.

vchekan commented 9 years ago

Context

Initial idea of implementing error recovery was to have single point, in tcp connection object where all error recovery would be made. But it turned out too low level, because depending on what this connection is doing, we might want different strategy for recovery. When RecoveryMonitor tests either broker is available, we do not want any recovery and want fail fast instead. Another issue is that currently only fetcher and producer are well protected, whereas earlier stages, such as connection, offset resolution, metadata fetching are more fragile, or have to implement their own recovery, thus polluting the code.

Requirements

Fast reaction: consider partition failed immediately after tcp error, and not after many retries. Remember to fail all request tasks which are waiting for responses. While waiting for recovery, pay attention to changes in metadata Work nicely with shutdown and drain logic

List of failure stages