cloud native - Githubissues

@cportele Some thoughts regarding #879, #880 and #881:

External dependencies

Everything that is not part of the application code but has to be accessed during runtime is an external dependency. That includes the store, any kind of database and any kind of web service. Any external dependency is by design volatile, it is either there or it is not.

Hence we would introduce an interface Volatile with method isAvailable that has to be implemented by classes that act as a wrapper for the external dependency. There might be dependency chains of classes that use this wrapper class that should also implement Volatile. The goal is that the actual consumer of the external dependency, which in most cases should be in the API layer, can easily check the availability.

Consumers of Volatile should be able to register a listener for state changes, so when the external dependency becomes available or unavailable, all consumers are notified. For the implementation of Volatile that means there has to be some kind of polling of the external dependency.

Entities

There are already some thoughts in #879.

Entity types are feature providers, tile providers and APIs, where feature providers may be used by tile providers and APIs, tile providers may be used by APIs, and APIs are used by clients/users.

There are three parties that are interested in the state and health of entities:

clients and users are consumers of APIs, see APIs below
administrators and orchestrators are consumers of health checks, see below
entities may be consumers of other entities, see the rest of this paragraph

States

Every entity already has as state, one of UNKNOWN , LOADING, RELOADING, DISABLED, DEFECTIVE, ACTIVE, HEALTHY. I think in general these should be sufficient, we would only change them if the need arises during implementation.

Currently only ACTIVE entities are visible to consumers. That would change, in the future as soon as an entity exists, it would be visible for consumers. It is then on the consumer to decide if it is fit for the intended usage.

HEALTHY is currently not used, but will be in the future. There can only be a difference between ACTIVE and HEALTHY when an entity can be partially usable. That means it has some subcomponents with their own state. I wouldn’t allow recursive use of that, in that case an availability flag should be sufficient for subcomponents.

As an example a FeatureProvider has interfaces FeatureCrs and FeatureQueries. The states might look like this:

state: HEALTHY
crs: true
queries: true

state: ACTIVE
crs: true
queries: false

state: DEFECTIVE
crs: false
queries: false

Coming back to the external dependencies from above, the reason for FeatureQueries not to be available would be an absent database. So in this case SqlConnectorRx would implement Volatile, and the status check for FeatureQueries would delegate to it.

Similar to Volatile, entities should also allow consumers to register a listener for state changes.

Startup and reloads

Every entity has an onStartup method that is also called on reload. I guess we should rename that to something like onChange. Currently when that method fails on reload, the entity switches to DEFECTIVE. For the sake of resiliency I think we have to implement a rollback, so if a reload fails the old version stays usable (at least if it was usable to begin with).

Listeners for Volatile or entity state changes should most likely also trigger this method. For APIs that might be trickier, since the consumer/listeners might be ApiExtensions. So we might need some way to trigger the entities onChange from the extensions.

Health checks

Health checks are a tool for administrators, they can check the current state of the application, collect metrics over time or get alerted on issues. Every entity should have a health check that shows its state und subcomponent states like described above. In addition there should be some global health checks, first and foremost for the store.

Health checks are also used by orchestrators like Kubernetes. In this case, the detailed health checks are not needed, instead two boolean flags alive and ready have to be derived from all the other health checks. ready should be false if the load balancer should pause sending requests to the given instance. alive should be false if the instance should be destroyed and a new one should be created.

APIs

The API entities are not consumed by other application components, but by clients and users. In this case the state and subcomponent state has to be communicated through http status codes and maybe the OpenAPI definition.

An enabled API should be considered as always online. That means 404 is reserved for disabled APIs or building blocks. When the onChange method is not yet done or required providers or external dependencies are not or only partially available, a 503 should be returned for the affected operations.

I think the main task here is to define what the relevant subcomponents are and how their state affects single operations or the API as a whole. I think the starting point would be the ApiExtensions, but for example most query parameters do not need a state. So every ApiExtension that requires any provider or external dependency could implement Volatile. The availability could then be checked in the APIs onChange method and also for example in the dispatcher.

interactive-instruments / ldproxy