Open wushujames opened 5 years ago
This is interesting, which Kafka bug is it referring to?
No particular kafka bug. I was thinking more like the network between Kafka-topic-manager and the broker drops out for a sec, and we are unable to submit the topic deletion to the broker.
Sorry, the motivation for kafka-topic-manager mentions a bug in Kafka when deleting topics too quickly. Would you have a reference to that bug? I've experienced something similar but can't find the related bug.
Oh right. I can’t look it up right now (spotty internet connection) but I don’t know if there is a specific bug filed.
From what I remember, it had to do with how the controller gets notified of zookeeper changes. Topic deletions are done by writing a zookeeper node under the /admin/delete_topics node. If 1 topic gets deleted (for example, “topic1”), the controller gets notified once (“delete topic1”). However, if 2 topics are deleted right after each other (“topic1” and “topic2”), the controller gets notified twice. One notification says “delete topic1” and the other notification says “delete topic1 and topic2”. And when the controller gets to that 2nd notification, it tries to delete topic1 again, even though it has already been deleted.
And so a large number of deletions cause a pile up of the number of things that the controller tries to do. (Is this quadratic growth? I think that’s the right word.)
Anyway, that’s what I remember. I would need to search the Kafka JIRAs to find the exact bug. Todd Palino thinks this may be fixed in 2.0.0.
-James
Sent from my iPhone
On Feb 20, 2019, at 6:02 PM, Ludden notifications@github.com wrote:
Sorry, the motivation for kafka-topic-manager mentions a bug in Kafka when deleting topics too quickly. Would you have a reference to that bug? I've experienced something similar but can't find the related bug.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
@Ludden : The Kafka community recently filed a JIRA on this issue. https://issues.apache.org/jira/browse/KAFKA-8180
Thanks, I'll keep an eye on this ticket
The current code pops the deletion request off the queue and then tries to work on it again. If for some reason that fails to be submitted to the broker (due to intermittent error?), we never try it again.
Possible solutions:
peek()
the queue, attempt delete, and onlytake()
from the queue when we succeedput it back on the queue again
seems simplest.