Enhance Timer/Cache/CachedValues API

lbwexler commented 2 months ago

New BaseService factory methods + Docs
Require/Check unique names for all service resources
Use NamedVariant for Timer
Improvements to Timer to avoid extra executions when primary instance changes
Unrelated cleanups to Timer declarations and deprecate onTimer
Logs should get archived on non-primary server as well

See also change on toolbox

TomTirapani commented 2 months ago

Looks good to me!

My main concern is that the timer having finished on primary does not necessarily mean that the results of that timer have been replicated and that it doesn't need to run on the new primary... thinking about very large datasets where it could potentially be brought down in the middle of serialisation, leaving the new primary with no data. But that seems like an edge case that maybe we don't need to worry about

lbwexler commented 2 months ago

Tom -- that's a really good and subtle point -- had not thought about that one. Seems like the main concern there is indeed about guarantees about "initialization". (e.g. if a cluster job were to just miss a cycle doing refresh work, that does not seem like a huge problem)

This concern would be mitigated by the change greg and I have been talking about which is having all nodes block their initialization until the primary node has indicated that it completed its initialization. I suppose there would still be some risk that the initialization didn't actually complete, if we have no way of knowing when it is fully sent.

Truly hardening some of this might require HZ "cp subsystem" that we don't have access to because it is in the enterprise product.

xh / hoist-core

Enhance Timer/Cache/CachedValues API #399