kyma-project / community

Provides general guidelines, contributing, and maintaining rules for all who add content to Kyma.
https://kyma-project.io/community/
Apache License 2.0
44 stars 108 forks source link

Separation of module lifecycle and business logic statuses #899

Closed janmedrek closed 7 months ago

janmedrek commented 8 months ago

Created on 2024-03-06 by Jan Medrek (@janmedrek).

Decision log

Name Description
Title Separation of module lifecycle and business logic statuses.
Due date 2024-03-31
Status Accepted on 2024-03-31
Decision type Binary
Affected decisions None

Context

In the current design, the "state" of modules is quite blurry - it is a mix of both deployment health coming from the Kyma Lifecycle Manager and business logic status that is monitored by the module operator. This causes the need to synchronize the value of this field and take into consideration multiple factors that do not fall under the same category. The overall situation is that it is not clear what is the current state of both deployment health and the business logic of the module.

Decision

A clear separation should be introduced between two state types for each module:

Then the following values should be used in the State fields:

Consequences

Module state that is reflected in the Manifest CR's status.state field will now be used to represent the Manifest Status. This means that it will only contain information on the success/failure of the k8s resource application in the cluster and whether the deployment remains in a healthy state or not.

The Kyma CR will be also affected in the following ways:

Then for the "business" status of the modules - the operator should manage the corresponding Module CR and update the state field there to indicate the Module Status. This is a field that can be presented to the user to notify them whether the module logic is running smoothly or if some kind of manual action is needed.

There should be a clear recommendation/contract on the values used for the Module Status, including:

As an example, here is the Warning status of Eventing CR, which indicates the backend was not specified and the module is not ready to handle events:

status:
  activeBackend: ''
  conditions:
    - lastTransitionTime: '2024-03-26T07:01:17Z'
      message: Backend config is not provided. Please specify a backend.
      reason: BackendNotSpecified
      status: 'False'
      type: BackendAvailable
  specHash: 0
  state: Error

And a status section from the corresponding Kyma CR, which includes Ready for both eventing and overall Kyma state as all the modules were deployed successfully and the workload specified in the Module Manifest runs properly:

  modules:
    - channel: fast
      [...]
      name: eventing
      resource:
        apiVersion: operator.kyma-project.io/v1alpha1
        kind: Eventing
        metadata:
          name: eventing
          namespace: kyma-system
      state: Ready
      [...]
  state: Ready
k15r commented 8 months ago

Do i understand this correctly:

if so: i totally agree with the decision to have the operator maintain the business logics state and expose it to the user. This way misconfigurations can be clearly communicated to the user. But hiding the modules state in the manifest which is not visible to the user will lead to confusion. A user enables a module, but it cannot be brought to a working state (KLMs view). The user needs to be able to see this condition (eg in the kyma cr in their cluster)

janmedrek commented 8 months ago

@k15r yes, you got it right.

Regarding your concerns - I would not "hide" the Manifest Status from the user, we can display it along with the Module Status in the Busola. We just need to figure out a transparent way so it is clear "which is which".

All the Manifest Statuses will be available in Kyma CR (and aggregated to the overall Kyma State), thus the separation will be:

So all of those will be clearly visible to the user.

pbochynski commented 8 months ago

Can you add a simple example of status field of Kyma CR and corresponding module CR? I think it would be easier to grasp the idea seeing the yaml content :)

a-thaler commented 8 months ago

About the module status, I think here it is important to drill it down to the 3 colors of a traffic light, that the user is used to interpret. As the error status will be available now as a business state, I suggest to have:

An explicit "Action required" might not really needed as the error state is actionable enough.

janmedrek commented 8 months ago

@pbochynski I've added an example. I am a bit worried it may not be as transparent if both resources use state to indicate essentially two different things.

Perhaps it would be better to have something like status.moduleState in the ModuleCR?

@a-thaler thanks for the suggestion, I've added it to the proposal.

valentinvieriu commented 8 months ago

We are actually trying to unify the status across resources. Have a look how we plan to deal with them in UI https://www.figma.com/file/AkzF9I1INdwc5iUtm5fKNU/Redesign-%26-Modules?type=design&node-id=7548-81549&mode=design&t=0pDZj0xsjmYAwSj9-0 image image

valentinvieriu commented 8 months ago

@janmedrek can you add https://github.com/kyma-project/busola/issues/2624 the details of the outcome? maybe some usecases

janmedrek commented 8 months ago

Regarding the naming clash, we could switch the KLM-related state to Availability or AvailabilityState instead of plain State. There would be also an argument behind using something more sophisticated than the plain State for the user-facing part. I'd like to discuss that with the team as well.

As for values, it would make sense to support Ready/Available, Error/Unavailable, and Processing/Pending.

NHingerl commented 8 months ago

About the module status, I think here it is important to drill it down to the 3 colors of a traffic light, that the user is used to interpret. As the error status will be available now as a business state, I suggest to have:

  • Error - An error or problem has occurred. The message contains actions you need to perform to resolve the issue. Example: You configured an expired certificate.
  • Warning - A statement of fact that alerts you to a potential mistake. You may proceed with your action despite the warning. Example: You configured a certificate which is about to expire
  • Running/Healthy - No errors or warnings

An explicit "Action required" might not really needed as the error state is actionable enough.

This proposal matches the UA/UX guidelines for BTP; I fully support this.