Closed azahnen closed 5 days ago
@cportele Some thoughts regarding #879, #880 and #881:
Everything that is not part of the application code but has to be accessed during runtime is an external dependency. That includes the store, any kind of database and any kind of web service. Any external dependency is by design volatile, it is either there or it is not.
Hence we would introduce an interface Volatile
with method isAvailable
that has to be implemented by classes that act as a wrapper for the external dependency. There might be dependency chains of classes that use this wrapper class that should also implement Volatile
. The goal is that the actual consumer of the external dependency, which in most cases should be in the API layer, can easily check the availability.
Consumers of Volatile
should be able to register a listener for state changes, so when the external dependency becomes available or unavailable, all consumers are notified. For the implementation of Volatile
that means there has to be some kind of polling of the external dependency.
There are already some thoughts in #879.
Entity types are feature providers, tile providers and APIs, where feature providers may be used by tile providers and APIs, tile providers may be used by APIs, and APIs are used by clients/users.
There are three parties that are interested in the state and health of entities:
Every entity already has as state, one of UNKNOWN
, LOADING
, RELOADING
, DISABLED
, DEFECTIVE
, ACTIVE
, HEALTHY
. I think in general these should be sufficient, we would only change them if the need arises during implementation.
Currently only ACTIVE
entities are visible to consumers. That would change, in the future as soon as an entity exists, it would be visible for consumers. It is then on the consumer to decide if it is fit for the intended usage.
HEALTHY
is currently not used, but will be in the future. There can only be a difference between ACTIVE
and HEALTHY
when an entity can be partially usable. That means it has some subcomponents with their own state. I wouldn’t allow recursive use of that, in that case an availability flag should be sufficient for subcomponents.
As an example a FeatureProvider
has interfaces FeatureCrs
and FeatureQueries
. The states might look like this:
state: HEALTHY
crs: true
queries: true
state: ACTIVE
crs: true
queries: false
state: DEFECTIVE
crs: false
queries: false
Coming back to the external dependencies from above, the reason for FeatureQueries
not to be available would be an absent database. So in this case SqlConnectorRx
would implement Volatile
, and the status check for FeatureQueries
would delegate to it.
Similar to Volatile
, entities should also allow consumers to register a listener for state changes.
Every entity has an onStartup
method that is also called on reload. I guess we should rename that to something like onChange
. Currently when that method fails on reload, the entity switches to DEFECTIVE
. For the sake of resiliency I think we have to implement a rollback, so if a reload fails the old version stays usable (at least if it was usable to begin with).
Listeners for Volatile
or entity state changes should most likely also trigger this method. For APIs that might be trickier, since the consumer/listeners might be ApiExtension
s. So we might need some way to trigger the entities onChange
from the extensions.
Health checks are a tool for administrators, they can check the current state of the application, collect metrics over time or get alerted on issues. Every entity should have a health check that shows its state und subcomponent states like described above. In addition there should be some global health checks, first and foremost for the store.
Health checks are also used by orchestrators like Kubernetes. In this case, the detailed health checks are not needed, instead two boolean flags alive
and ready
have to be derived from all the other health checks. ready
should be false if the load balancer should pause sending requests to the given instance. alive
should be false if the instance should be destroyed and a new one should be created.
The API entities are not consumed by other application components, but by clients and users. In this case the state and subcomponent state has to be communicated through http status codes and maybe the OpenAPI definition.
An enabled API should be considered as always online. That means 404 is reserved for disabled APIs or building blocks. When the onChange
method is not yet done or required providers or external dependencies are not or only partially available, a 503 should be returned for the affected operations.
I think the main task here is to define what the relevant subcomponents are and how their state affects single operations or the API as a whole. I think the starting point would be the ApiExtension
s, but for example most query parameters do not need a state. So every ApiExtension
that requires any provider or external dependency could implement Volatile
. The availability could then be checked in the APIs onChange
method and also for example in the dispatcher.
Roadmap to level up ldproxy from cloud ready to cloud native: