Dynamic sharding support

weyfonk commented 2 months ago

Fleet's current sharding implementation (see #1740) only supports configuring shards at deployment time. Any change to the set of supported shards requires Fleet to be redeployed with the new set of shards.

Instead, we could enable shards to be dynamically reconfigured. This could be achieved by:

Maintaining a ConfigMap storing the set of supported shards, which a user could then edit/patch.
Adding a controller to react to changes to that ConfigMap, deleting/adding Fleet controllers accordingly
- What should happen for shards being deleted, with resources (eg. GitRepos) mapped to them? Should we expose configuration to allow/prevent force-deletion of such shards, or issue warnings, etc?

Nice to have:

[ ] Use config map to check for any workload created with a shard ID which is not mapped to any Fleet controller deployment, and output a warning (eg. in the unsharded controller's logs), instead of silently ignoring the workload

Dependencies

Making this visible to users will require work on the Rancher UI

More info

Here's how Flux did sharding:

shane-davidson commented 1 month ago

Why not just query the main controller for the shard you are supposed to use. If the shard fails or goes away again just fall back to querying the main controller?

Why should the end use need to map the gitrepo to the shard?

Is manually configuring/managing the mapping between git repos and shards sustainable for very large deployments? (We use 6 rancher environments with 15+ git repos * 6 workspaces (540+ git repos)

Surely the lowest cost of entry for "dynamic" sharding would be to simply say how many shards you want and fleet will automatically balance the load (based on number of resources per repo? or just a uuid of the git repo? or some other smarts)

manno commented 1 month ago

Why should the end use need to map the gitrepo to the shard?

For now, we investigate if we can change the number of shards at runtime, without reinstalling fleet. Automatically assigning gitrepos to shards is a very interesting problem, which we might take on in a future version. Indeed, we would write a new controller to balance the shard labels.

rancher / fleet

Dynamic sharding support #2357

Dependencies

More info