Open andrwng opened 6 months ago
Wonder if we should bubble up to the health status the fact that the cluster config is in a state requiring a restart. Right now it's buried in some api the user wouldn't normally access. Or treat it as a metric state that they can alert on (similar to what we do now with some disk alert functions).
@hcoyote : Please file a separate ticket for your good suggestion there -- and that'll go to @aanthony-rp and team.
Let's keep this ticket for the original issue that Andrew filed.
Today, when you change tiered storage configs (e.g. to start using tiered storage), the underlying
cloud_storage::remote()
doesn't get constructed until the process is restarted. It'd be nice if the remote could be rebuilt at runtime, but this is a tricky task, considering all the places the remote leaks into different abstractions with the expectation that it is constructed once at startup.Some surprises that come out of this:
We should refine the behavior of the span in between setting tiered storage configs and the next restart. A simple strawman proposal is to at least reject topic creation. It doesn't solve everything though because topics that have already been created can have equally surprising behavior when toggling tiered storage cluster configs.
Some other, spicier options to consider:
needs-restart
, perhaps we should stop/shutdown and rebuild most of theapplication
. This seems a bit risky to do automatically, but it would avoid this problem.JIRA Link: CORE-2911