Open suyashtava opened 2 years ago
@linkedin, @mhratson, = @andrewchoi5 @Lincong for review
@efeg for review pls.
@CCisGG Any one whom you can add for review please.
I don't have much context on this repo. @mitchhh22 could you or your team help to review this?
I can take a look next week. /Maryan
@suyashtava thanks for the contribution! Before accepting the PR i'd like to understand more of the problem being solved by this.
Could you please describe why restarting is an issue? While I can assume that restarting may be slow i'd like to know other arguments as well if available.
Thanks
@suyashtava thanks for the contribution! Before accepting the PR i'd like to understand more of the problem being solved by this.
Could you please describe why restarting is an issue? While I can assume that restarting may be slow i'd like to know other arguments as well if available.
Thanks
@mhratson
Background: We were using KMF for Broker health Detection, and since all partitions use the same Producer, even if one broker is slow in a cluster all the other partitions in the queue of the same Broker will also get slow, and that was making it difficult to find an unhealthy broker.
For this, we introduced 1:1 mapping of Partition and Producer so other partitions can still be produced by a producer and not blocked by other Producer feeling slow.
Challenge: Now on every shutdown, we need to close multiple Producers, which we did. But whenever a new partition was added, it again restarted all the producers which was causing significant slowness.
Proposal: We can discuss on 1:1 Mapping on Producer: Partition but surely restarting all Threads on each new partition is an overhead even in the current KMF, when we can just simply add the new partition to the scheduler.
If the 1:1 mapping of Producer and Partition sounds good I can raise another PR for the same, after this one. @mitchhh22 @mhratson @andrewchoi5 @Lincong @efeg
@mhratson by any chance u get a moment to check this? Thanks in advance.
For this, we introduced 1:1 mapping of Partition and Producer so other partitions can still be produced by a producer and not blocked by other Producer feeling slow.
That's not the case for this kafka-monitor
, isn't it?
For this, we introduced 1:1 mapping of Partition and Producer so other partitions can still be produced by a producer and not blocked by other Producer feeling slow.
That's not the case for this
kafka-monitor
, isn't it?
@mhratson Apolgies, for some personal reason I had to drop system. Reopening this thread. You are correct this is not the case for this KMF.
This opens 2 things: IMHO We should Decouple Producer for each partition here aswell, so that it would be easy to detect which broker is slow.
Even if we decide against above, atleast we should not be restarting whole Producer in case of new Partition, and it should be attached to same Producer, that way we will increase KMF Availability.
Added a PR. #394 for Issue #395 (Multiple Pruder per partition)
On Every new Partition, We are killing the whole Producer Service, and restarting it, instead we can update the state of Partition to include new partitions. This will stop unnecessary restart.
Issue: https://github.com/linkedin/kafka-monitor/issues/376