oxidecomputer / omicron

Omicron: Oxide control plane
Mozilla Public License 2.0
252 stars 40 forks source link

clickhouse-admin: Reject old configurations #7137

Open andrewjstone opened 10 hours ago

andrewjstone commented 10 hours ago

Currently, everytime the reconfigurator executor pushes a configuration to clickhouse-admin-keeper or clickhouse-admin-server, the XML configuration file is generated and reloaded by the keeper or server process respectively. This is somewhat aesthetically displeasing if we consistently rewrite the same configuration file, but we decided when closing #6909 that it's the more robust way to handle reconfiguration failures than any alternative.

However, for correctness purposes, we still want to reject older configurations from overwriting newer ones. This can wreak havoc on the clusters. While it's very unlikely to see this currently due to how we manage changes, it is still possible with N independent nexuses pushing down their own latest configuration.

To fix this, we should save the current configuration (with its generation number) in an auxiliary file, and keep this generation number in memory at the admin-server. We can then ignore any attempts to use an old configuration.

karencfv commented 9 hours ago

I think I'm overlooking something, but why do we need to save the entire current configuration and not just the generation number? Generation numbers are always assigned sequentially no?

andrewjstone commented 8 hours ago

I think I'm overlooking something, but why do we need to save the entire current configuration and not just the generation number? Generation numbers are always assigned sequentially no?

It's not strictly necessary. It just may be useful for debugging. Overhead is minimal.

karencfv commented 8 hours ago

Gotcha, yeah that makes sense