linkedin / venice

Venice, Derived Data Platform for Planet-Scale Workloads.
https://venicedb.org
BSD 2-Clause "Simplified" License
487 stars 84 forks source link

[server] Add store-aware partition-wise shared consumer assignment strategy #1261

Closed sixpluszero closed 4 days ago

sixpluszero commented 1 week ago

[server] Add store-aware partition-wise shared consumer assignment strategy

Add a new version of partition wise shared consumer assignment strategy. We have been seeing subscriptions to the same topic / store be assigned to the same consumer, and for a particular store push's view (can be inc push / full push) it can be competing with each other and becomes the long-tail partition and slow down the overall progress. Assuming the store/topic itself does not have data-skew, then we should try to assign these subscriptions to different consumers as even as possible. Especially for RT topics, backup / current / future version will share the same input volume, so we should not treat them differently within the same pool, but we can further optimize that in different level (Pool assignment strategy) This PR adds the new strategy so when a new topic partition is looking for assignment, it will compute and sort all the consumer's load based on general load and the store-specific load. It will assign the new topic partition to the least loaded consumer based on the computed load.

How was this PR tested?

Add a new unit test, will add integration test as well.

Does this PR introduce any user-facing changes?