wasmCloud / wadm

wasmCloud Application Deployment Manager (wadm) is a Wasm-native orchestrator for managing and scaling declarative wasmCloud applications.
https://wasmcloud.com
Apache License 2.0
100 stars 26 forks source link

[FEAT] Support exponential backoff in the wrapped BackoffAwareScaler type #253

Closed brooksmtownsend closed 1 month ago

brooksmtownsend commented 6 months ago

Inside of the scaler logic we have the notion of a BackoffAwareScaler, which approached exponential backoffs in a very naïve way. Basically, for specific commands that might take a long time (ProviderStarted, ComponentScaled) we would prevent that scaler from sending commands out either for 30 seconds or until we receive an event that is specifically in response to that scaler (for a provider start command, we'd expect for that provider to either start or fail to start with a corresponding event.)

The real problem we're trying to solve here is preventing a scaler from thrashing in response to events that might be relevant. What isn't solved for here is the more generic problem of thrashing in response to events that are relevant. Imagine the scenario where a scaler is attempting to start a Wasm component that is in a private registry, and the wasmCloud host does not have credentials. The scaler publishes the command, the host nearly immediately fails to authenticate, and a component_scale_failed event is emitted. That scaler sees that the component failed to scale, and being the dumb scaler that it is (doesn't look at the error type) immediately tries to restart it. Rust is fast, and we'll be retrying this forever or until someone notices the increased load.

My proposal for this is to have every scaler wrapped in the BackoffAware structure, where external to the scaler logic we can have an internal backoff timer for repeated commands. We want to make sure that the individual scaler is able to reconcile immediately in the case where state is actually modified, but in the case where it's attempting hopelessly to perform the same command over and over we can have an exponential (power of two, Fibonacci, etc) backoff for sending out that next command.

stale[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this has been closed too eagerly, please feel free to tag a maintainer so we can keep working on the issue. Thank you for contributing to wasmCloud!

brooksmtownsend commented 4 months ago

I still want this so I'll bump the stalebot for now

stale[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this has been closed too eagerly, please feel free to tag a maintainer so we can keep working on the issue. Thank you for contributing to wasmCloud!