rancher / fleet

Deploy workloads from Git to large fleets of Kubernetes clusters
https://fleet.rancher.io/
Apache License 2.0
1.47k stars 216 forks source link

Dynamic sharding support #2357

Open weyfonk opened 2 months ago

weyfonk commented 2 months ago

Fleet's current sharding implementation (see #1740) only supports configuring shards at deployment time. Any change to the set of supported shards requires Fleet to be redeployed with the new set of shards.

Instead, we could enable shards to be dynamically reconfigured. This could be achieved by:

  1. Maintaining a ConfigMap storing the set of supported shards, which a user could then edit/patch.
  2. Adding a controller to react to changes to that ConfigMap, deleting/adding Fleet controllers accordingly
    • What should happen for shards being deleted, with resources (eg. GitRepos) mapped to them? Should we expose configuration to allow/prevent force-deletion of such shards, or issue warnings, etc?

Nice to have:

Dependencies

More info

Here's how Flux did sharding:

shane-davidson commented 1 month ago

Why not just query the main controller for the shard you are supposed to use. If the shard fails or goes away again just fall back to querying the main controller?

Why should the end use need to map the gitrepo to the shard?

Is manually configuring/managing the mapping between git repos and shards sustainable for very large deployments? (We use 6 rancher environments with 15+ git repos * 6 workspaces (540+ git repos)

Surely the lowest cost of entry for "dynamic" sharding would be to simply say how many shards you want and fleet will automatically balance the load (based on number of resources per repo? or just a uuid of the git repo? or some other smarts)

manno commented 1 month ago

Why should the end use need to map the gitrepo to the shard?

For now, we investigate if we can change the number of shards at runtime, without reinstalling fleet. Automatically assigning gitrepos to shards is a very interesting problem, which we might take on in a future version. Indeed, we would write a new controller to balance the shard labels.