Open shanicky opened 8 months ago
Currently there are three strategies Adaptive, Fixed and Custom.
This design totally makes sense to me, but I am a bit afraid that it may be too obscure for our users.
If we can agree on that Custom
is not that necessary, I would recommend to follow #12058, where I and @neverchanje proposed a syntax:
streaming_parallelism = AUTO
: Use Adaptive
policy herestreaming_parallelism = <number>
: Use Fixed
policy here (also corresponds to the status quo)What do you think?
BTW, #13270 used a global parameter opts.enable_scale_in_when_recovery
which will break the design, we need to make the design before release the enable_scale_in_when_recovery
.
link https://github.com/risingwavelabs/risingwave/issues/12741 as I am not sure if the fact that we may want a table's partition number (equal to the streaming parallelism right now) may have some minor impact on this design
For the current implementation, users have to manually execute risectl to scale up or down the parallel unit after bringing a new CN online or before taking a CN offline. This can be quite cumbersome in some scenarios, so we need an automated scaling strategy to handle backend expansion and contraction.
At present, our designed solution is to automatically control the parallelism of a group of streaming jobs bound by NOSHUFFLE (or future Ensemble). This can be cascadedly modified through the user interface for any one of the streaming jobs (because the parallelism is bound).
Currently there are three strategies
Adaptive
,Fixed
andCustom
.Adaptive
will automatically scale up and down to the current parallelism limit of the cluster.Fixed
will keep the parallelism fixed, it will always remain during the online and offline process of the cluster nodes, but it will generate migration to balance traffic. Before the Ensemble feature goes live, we need a compatibility mode to ensure the behavior when the available parallel units of the cluster is less than the fixed number.Custom
is a low-level mode prepared for the cloud team, intended for potential refined traffic control in the future. It’s not something we’re considering at the moment. Any fragments marked as ‘custom’ (if any exist) will not perform any actions.Question: