Handle the case where deletion requests can't go through

wushujames commented 5 years ago

The current code pops the deletion request off the queue and then tries to work on it again. If for some reason that fails to be submitted to the broker (due to intermittent error?), we never try it again.

Possible solutions:

put it back on the queue again
peek() the queue, attempt delete, and only take() from the queue when we succeed

put it back on the queue again seems simplest.

Ludden commented 5 years ago

This is interesting, which Kafka bug is it referring to?

wushujames commented 5 years ago

No particular kafka bug. I was thinking more like the network between Kafka-topic-manager and the broker drops out for a sec, and we are unable to submit the topic deletion to the broker.

Ludden commented 5 years ago

Sorry, the motivation for kafka-topic-manager mentions a bug in Kafka when deleting topics too quickly. Would you have a reference to that bug? I've experienced something similar but can't find the related bug.

wushujames commented 5 years ago

Oh right. I can’t look it up right now (spotty internet connection) but I don’t know if there is a specific bug filed.

From what I remember, it had to do with how the controller gets notified of zookeeper changes. Topic deletions are done by writing a zookeeper node under the /admin/delete_topics node. If 1 topic gets deleted (for example, “topic1”), the controller gets notified once (“delete topic1”). However, if 2 topics are deleted right after each other (“topic1” and “topic2”), the controller gets notified twice. One notification says “delete topic1” and the other notification says “delete topic1 and topic2”. And when the controller gets to that 2nd notification, it tries to delete topic1 again, even though it has already been deleted.

And so a large number of deletions cause a pile up of the number of things that the controller tries to do. (Is this quadratic growth? I think that’s the right word.)

Anyway, that’s what I remember. I would need to search the Kafka JIRAs to find the exact bug. Todd Palino thinks this may be fixed in 2.0.0.

-James

Sent from my iPhone

On Feb 20, 2019, at 6:02 PM, Ludden notifications@github.com wrote:

Sorry, the motivation for kafka-topic-manager mentions a bug in Kafka when deleting topics too quickly. Would you have a reference to that bug? I've experienced something similar but can't find the related bug.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

wushujames commented 5 years ago

@Ludden : The Kafka community recently filed a JIRA on this issue. https://issues.apache.org/jira/browse/KAFKA-8180

Ludden commented 5 years ago

Thanks, I'll keep an eye on this ticket

wushujames / kafka-topic-manager

Handle the case where deletion requests can't go through #4