kubewarden / helm-charts

Helm charts for the Kubewarden project
Apache License 2.0
25 stars 17 forks source link

OpTel and Jaeger integration is not working properly #492

Closed jvanz closed 3 months ago

jvanz commented 3 months ago

During the testing of the Kubewarden v1.15.0 release candidates we noticed that Kubewarden integration with our observability stack is not working as expected with the latest OpTel and Jaeger Helm chart versions. It looks like that our current configuration used in the Helm chart to deploy the Kubewarden stack is not working. The OpTel collector is not able to find the Jaeger service to send some tracing data. We need to investigate if there is some compatibility between the latest versions from Jaeger and OpTel or if we just need to update our configurations in our Helm charts.

This are the versions in use during the rc testing with the latest versions:

cert-manager                    cert-manager    1               2024-07-26 09:30:01.528778837 -0300 -03 deployed        cert-manager-v1.13.1            v1.13.1    
jaeger-operator                 jaeger          1               2024-07-26 09:33:31.375358011 -0300 -03 deployed        jaeger-operator-2.54.0          1.57.0     
kubewarden-controller           kubewarden      1               2024-07-26 09:43:30.764117351 -0300 -03 deployed        kubewarden-controller-2.3.0-rc2 v1.15.0-rc2
kubewarden-crds                 kubewarden      1               2024-07-26 09:43:27.75106705 -0300 -03  deployed        kubewarden-crds-1.7.0-rc2       v1.15.0-rc2
kubewarden-defaults             kubewarden      1               2024-07-26 09:43:50.543045482 -0300 -03 deployed        kubewarden-defaults-2.2.0-rc2   v1.15.0-rc2
my-opentelemetry-operator       open-telemetry  1               2024-07-26 09:31:27.972671429 -0300 -03 deployed        opentelemetry-operator-0.64.4   0.103.0    
prometheus                      prometheus      1               2024-07-26 09:36:12.743431631 -0300 -03 deployed        kube-prometheus-stack-61.3.2    v0.75.1    

Issue(s) found:



#### Acceptance criteria
- Discover if there is some compatibility issue between Jaeger and OpTel
- If we have issues between Jaeger and OpTel, find the greatest version that work with our current configuration and updates the docs changing the limit version to be used with both dependencies
- If there is no compatibility issues beetwen Jaeger and OpTel, update our Helm chart or any other component necessary to make it work again. 
- Tracing data should be visible in Jaeger UI using the required version (considering the previous acceptance criteria)
viccuad commented 3 months ago

Adding info: Kubewarden 1.14 with the otl stack deps, versions doc.kubewarden.io, works fine. Both tracing and metrics. That is, opentelemetry-operator-0.56.0 , jaeger-operator-2.49.0 , kube-prometheus-stack-51.5.3. These versions are rather old, though.

kkaempf commented 3 months ago

is not working as expected

Please, more details.

jvanz commented 3 months ago

is not working as expected

Please, more details.

* what's the failed test case ?

None, this is not spotted by test cases.

* how does it fail ?

The OpTel collector is not able to find the Jaeger service to send some tracing data

I've rephrased that in a seperated section in the issue description.

* what is the expected output ?

Acceptance criteria updated to make that more clear

* logs?

Description updated

viccuad commented 3 months ago

After fixing https://github.com/kubewarden/policy-server/issues/847, testing with 1.15.0-rc2, policy-server:latest, and opentelemetry stack from docs.kubewarden.io shows that everything is working as expected :).

On latest opentelemetry stack and 1.15.0-rc2, with policy-server:latest, everything works too. Yet we hit https://github.com/jaegertracing/helm-charts/issues/549. The workaround is to expand the ClusterRole jaeger-operator with get, list permissions for ingressclasses. Will add workaround to e2e tests.

I consider this done.

What to expect when testing:

Click me On Jaeger: There must be a service `kubewarden-policy-server` that exposes 6 operations (`validate_settings`, `validate`, `validation`, `audit`, `request`, `policy_log`). These come from the policy-server. On Prometheus: `kubewarden_policy_evaluation_latency_milliseconds_sum` is present (created by policy-server) `kubewarden_policy_total` is present (created by kubewarden-controller) On Grafana: with default policies installed, targetting a ClusterAdmissionPolicy (notice the prefix clusterwide) `clusterwide-no-privilege-escalation` and an audit scanner run, each metric has some values.