keptn / lifecycle-toolkit

Toolkit for cloud-native application lifecycle management
https://keptn.sh
Apache License 2.0
272 stars 111 forks source link

How to integrate with Argo Rollouts #3409

Closed mowies closed 1 week ago

mowies commented 3 months ago

Goal

Find out, what would be a good Keptn feature to integrate into Argo Rollouts. The integration should be as straight forward as possible for this MVP.

Details

DoD

mowies commented 1 week ago

Research results - Argo Rollouts

Possible use-cases

Possible use-cases to integrate Argo Rollouts with Keptn are the following:

Keptn metrics provider

Introducing Keptn as a metrics provider for Argo Analysis is a straight forwards integration. KeptnMetric resources can be used by Argo Analysis for reading the metric values. This way a user can re-use the same query (for any of the Keptn supported metric providers) for multiple Argo AnalysisValueTemplates without the need to copy it over again. The Argo AnalysisValueTemplate can contain only an object reference to the KeptnMetric containing the query and providing the data. The KeptnMetric can be referenced also from a different namespace, so that multiple Argo Analyses can make use of the same object. This way, also the information about the metric provider used by Keptn (like Prometheus, Dynatrace, Datadog,…) is encapsulated behind the KeptnMetric.

From a technical POV, Argo Rollouts provide a Provider interface here, which needs to be implemented in order to support Keptn as metrics provider and additionally it needs to be added to the ProviderFactory, which provides the k8s client for fetching the needed resources (KeptnMetric). Argo AnalysisValueTemplate CRD needs to be modified → a new parameter needs to be added to the MetricProvider struct in order to support the new Keptn provider. The CRD can possibly look like the following:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: success-rate
  namespace: analysis-namespace
spec:
  args:
  - name: service-name
  metrics:
  - name: success-rate
    interval: 5m
    successCondition: result[0] >= 0.95
    failureLimit: 3
    provider:
      keptn:
        name: keptn-metric-success-rate
        namespace: keptn-metric-namespace

Observability of Argo Rollouts using Keptn

Another option on how to integrate Keptn with Argo Rollouts would be, that Keptn will be able to provide observability features of an Argo Rollout. Since Argo Rollouts already provides a Dashboard for showing the status of Rollout, it does not show the reason of the potential failure and DORA metrics. Integration of Keptn Observability and Argo Rollout is technically very challenging, since with the current implementation and funcionality of Keptn, we will need to change tha architecture of Keptn.

The reason is we are not able to display multiple ReplicaSets of a single Rollout as part of a single KeptnApp. The reason is that both ReplicaSets of the Rollout share the same workload name (so they are the same KeptnWorkload). Keptn does not allow to have the same KeptnWorkload with two different versions as part of a single KeptnApp. The trace displays the Rollout as a two separate KeptnAppVersions without any connection. Therefore supporting this use-case will lead to big changes in the core functionality of Keptn.

Another technically less challenging option may be to link the two existing KeptnAppVersions of a single Rollout via spanLinks used in KeptnAppContext. This way we might be able to at least link the traces of a single Rollout together by implementing another controller in Keptn, which be able to observe ReplicaSets (which are part of Rollouts) and fetch the spanID and spanLink of the KeptnWorkload representing this ReplicaSet and creating a KeptnAppContext, that will link the future possible Rollout of the ReplicaSet with the previous one.