mailgun / kafka-pixy

gRPC/REST proxy for Kafka
Apache License 2.0
768 stars 119 forks source link

Subscription to a topic fails indefinitely after ZooKeeper connection loss #124

Closed horkhe closed 6 years ago

horkhe commented 6 years ago

Following sequence of events was observed:

  1. The subscriber actor reported an error to subscribe to a new topic with error zk: could not connect to a server;
  2. More than a minute after the first error the second one appeared but this time it was zk: node does not exist;
  3. From that moment on error zk: node does not exist kept reappearing every 500ms;

My theory is that after the initial connection error it took the ZooKeeper client so long to re-establish the connection that our ephemeral registration node expired but the subscriber kept acting as if it existed. To fix this issue we need to make the subscriber handle the case of missing registration gracefully.