keptn / lifecycle-toolkit

Toolkit for cloud-native application lifecycle management
https://keptn.sh
Apache License 2.0
298 stars 119 forks source link

SLO for Traces #508

Closed hwinkel closed 1 year ago

hwinkel commented 1 year ago

Please introduce SLO validation coming from Workload OTEL Traces. Basically define Otel traces as SLIs which could be queried from the Otel Backend (e.g. Tempo)

hwinkel commented 1 year ago

as seen in https://tracetest.io/ from implementation perspective

agardnerIT commented 1 year ago

As a first thought, your KeptnTask could call the tracetest API. Looping in the tracetest team for oversight on this. @kdhamric @jorgeepc

kdhamric commented 1 year ago

This is the second time I have seen Tracetest referenced with Keptn - first was a presentation by @bradmccoydev where he was running Tracetest as part of the test process (mentions at 11:13 and 12:03): https://www.youtube.com/watch?v=TO_d-HWXP5A.

Would love to hear a vision of how this would work. Will spend some time today working with Keptn to gain a deeper understanding.

hwinkel commented 1 year ago

@kdhamric TL,DR basically keptn or the keptn lifecycle controller is evaluation some set SLOs, describing the intended, expected result) and comparing them with actual test results coming out of SLI (Service level Indicators) typical metrics are the source of SLIs, but it can be "anything" and in these respects defining traces as SLIs which can evaluated and compared to expectations might make sense.

kdhamric commented 1 year ago

@agardnerIT Does any of the material / scripts mentioned here help / apply - https://github.com/keptn/integrations/issues/48

Spent some time this weekend to come up to speed on Keptn, and will be doing more tomorrow.

agardnerIT commented 1 year ago

Yes, potentially (I'd actually forgotten I'd written that)! However, be aware that the lifecycle toolkit is actually a separate Keptn project that aims to do things in a k8s-native and GitOps friendly way.

As the name suggests, it hooks into the lifecycle of a Deployment so you can just do a kubectl apply -f deployment.yaml and Keptn Lifecycle Toolkit (KLT) "injects itself" into the process. In that way you can do pre-deployment-tasks, pre-deployment-slo-evaluations and then post-deployment-tasks and post-deployment-evaluations.

Finally, after each Deployment is done, via a KeptnApp CRD you can create the concept of an application which is a higher level logical grouping of multiple Deployments seen as a single KeptnApp. The 4 task types above can be executed for the KeptnApp as a whole unit.

Where I see tracetest fitting nicely is during the pre-deployment-evaluations (to test that any dependencies are functioning correctly) and then (primarily) during the post-deployment-evaluation so after Deployment has occurred and pods are started, we call the tracetest API and get a pass / warning / fail for those traces.

The lifecycle toolkit has it's own (in progress) documentation here: https://lifecycle.keptn.sh/docs/ and the readme is also good (in this repo)

thisthat commented 1 year ago

Hi @hwinkel, thank you for the suggestion, this is an interesting use case. I would see this as a special EvaluationProvider.

mathnogueira commented 1 year ago

Hello everyone, @danielbdias and I are Tracetest engineers and we are doing some exploration work on how we can integrate Keptn and Tracetest. Thanks for the material shared in this thread, this will be very helpful.

danielbdias commented 1 year ago

Hi folks!

Analyzing your material at first glance we perceived that a simple way to integrate Tracetest with Keptn is by structuring it as a container and calling using job-executor-service (that later could be adapted to a KeptnTask), that could execute a test and validate if the API to be deployed is working correctly.

Using the idea of job-executor-service, we could add two resources to a project/service (a CLI configuration and a test definition) and use the following job manifest:

apiVersion: v2
actions:
 - name: "Run tracetest to validate a service"
   events:
     - name: "sh.keptn.event.test.triggered"
   tasks:
     - name: "Run tracetest"
       files:
         - data/test-definition.yaml
         - data/tracetest-cli-config.yaml
       image: "kubeshop/tracetest:latest"
       cmd:
         - sh
       args:
         - -c
         - "./tracetest --config /keptn/data/tracetest-cli-config.yaml test run --definition /keptn/data/test-definition.yaml --wait-for-result"

What do you think? Does it make sense?

mathnogueira commented 1 year ago

This task that @danielbdias posted would meet @agardnerIT's idea of how to use Tracetest to test a newly deployed app. I would like to know what are the benefits of using an Evaluation Provider over using a task for testing the trace generated by an app.

thschue commented 1 year ago

Added an issue (#614) for adding the functionality of running tasks in containers.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.