zitadel / zitadel

ZITADEL - Identity infrastructure, simplified for you.
https://zitadel.com
Apache License 2.0
7.61k stars 458 forks source link

Zitadel OpenTelemetry traces resulting in high cardinality #8096

Open hreddy-klaviyo opened 2 weeks ago

hreddy-klaviyo commented 2 weeks ago

Preflight Checklist

Describe your problem

We setup the config for Zitadel to record OTel traces and OTel metrics on each endpoint https://github.com/zitadel/zitadel/blob/b055d1d9e67587eacce8e34649085bcd3268a055/cmd/defaults.yaml#L11

We've noticed that the cardinality of two specific operations is extremely high

{ZITADEL_DOMAIN}/v2beta/idp_intents/<idp_intent_id>
{ZITADEL_DOMAIN}/v2beta/oidc/auth_requests/<auth_request_id>

This is because these IDs are generated on each login and are causing a lot of issues on the ingesting side. This is further exacerbated by metrics that are computed on these.

e.g.

latency_bucket{operation="{ZITADEL_DOMAIN}/v2beta/oidc/auth_requests/V2_270990429988614816",service_name="ZITADEL",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="5"} 0
latency_bucket{operation="{ZITADEL_DOMAIN}/v2beta/oidc/auth_requests/V2_270990429988614816",service_name="ZITADEL",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET",le="10"} 10
...

set

Describe your ideal solution

These operations should record the Auth Request IDs and IDP Intent IDs as Tags on the span rather than be present in the path.

Version

2.53.4

Environment

Self-hosted

Additional Context

image

Reducing sampling to 0.1 helped a bit to buy us some time :)

livio-a commented 2 weeks ago

@adlerhurst when checking this, we should also remove the host from the name