keptn / lifecycle-toolkit

Toolkit for cloud-native application lifecycle management
https://keptn.sh
Apache License 2.0
322 stars 126 forks source link

Split Metric and Lifecycle into two distinct operators #823

Closed thisthat closed 1 year ago

thisthat commented 1 year ago

Goal

Split the current operator into two separate operators, one for each CRD group.

Technical Details

There are two main reasons why this change is beneficial for end users:

Furthermore, we can restrict the permissions for each operator with more fine granularity.

This split can cause issues during upgrades. In the case the new metric operator is up-and-running while the old single-operator runs, there are two controllers on the same resource that try to write. Bumping the Metric CRD version should suffice to avoid this problem.

General action plan

Acceptance Criteria

DoD

### Tasks
- [ ] https://github.com/keptn/lifecycle-toolkit/issues/845
- [ ] https://github.com/keptn/lifecycle-toolkit/issues/846
- [ ] https://github.com/keptn/lifecycle-toolkit/issues/847
- [ ] https://github.com/keptn/lifecycle-toolkit/issues/848
- [ ] https://github.com/keptn/lifecycle-toolkit/issues/849
- [ ] https://github.com/keptn/lifecycle-toolkit/issues/850
- [ ] https://github.com/keptn/lifecycle-toolkit/issues/851
- [x] adapt docs
- [x] adapt examples
- [ ] ???
- [ ] profit
- [ ] https://github.com/keptn/lifecycle-toolkit/issues/942
- [ ] https://github.com/keptn/lifecycle-toolkit/issues/960
bacherfl commented 1 year ago

Looking into this I found some questions that we need to address:

mowies commented 1 year ago

Sharing CRDs is generally not a best practise I think 😄 But didn't we plan to have some generic endpoint in the metrics operator to query metrics from? I think that would also be the place for KLT to fetch metrics from. And then the rest is handled in the metrics operator. That way we could achieve clear separation of concerns.

In any case I would suggest to keep API groups in one operator and not split them over 2 operators.

thisthat commented 1 year ago

If we can't easily have a shared "CRD", I agree with @mowies. The Providers shall be moved into the metric-operator. The lifecycle-operator should only use keptn-metric for evaluations. However, we have a shared read and not write of the KeptnEvaluationProvider

bacherfl commented 1 year ago

I think there was a plan for retrieving metrics from that generic endpoint. However, right now we also support KeptnEvaluationDefinitions referring to e.g. Prometheus and Dynatrace EvaluationProviders directly as well (i.e. not only the KeptnMetrics). So the problem is that in the metrics.KeptnMetrics we currently depend on lifecycle.KeptnEvaluationProvider, which is also a dependency of lifecycle.KeptnEvaluationDefinition.

In my opinion, the cleanest approach would be to introduce something like metrics.KeptnMetricsProvider that can be used by the metrics.KeptnMetrics controller, and only refer to metrics.KeptnMetrics in the lifecycle.KeptnEvaluations - here we could then also go via the generic endpoint, as @mowies suggested