TL;DR - Add status conditions to the OpenTelemetryCollector resource so that on deployment native Kubernetes features can check health of custom resource knowing that it was deployed successfully. This would be prior to collector telemetry being exported.
At time of writing, a collector can be deployed, the pod get stuck in CrashLoopBackOff, the "Deployment" resource will show as unhealthy, but the OpenTelemetry Collector resource will show as healthy and deployed.
In the screenshot below, a collector can fail, and produce no telemetry (i purposefully broke the config so that it would get into this state). A Gitops tool that handles k8s syncing of resources (in this case argo) will view this as a "successful deployment" in that it Synced correctly, and the app health check against the custom resource was fine. Therefor, it's synced and viewed as healthy. You can see though that the underlying deployment does correctly reflect health, but it's not bubbled up. If it were to get bubbled up, the resource should have a set of status conditions like:
status:
conditions:
- lastTransitionTime: '2024-10-16T16:26:42Z'
lastUpdateTime: '2024-10-16T16:26:42Z'
message: >-
ReplicaSet "otel-delivery-collector-5467cfd54f" has successfully
progressed.
reason: NewReplicaSetAvailable
status: 'True'
type: Progressing
- lastTransitionTime: '2024-10-16T16:26:43Z'
lastUpdateTime: '2024-10-16T16:26:43Z'
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: 'False'
type: Available
From there, you can perform health checks on sync/deployment, and then alert early on failure when there won't be telemetry from the latest deployment. See how other resource health checks work in this argo doc.
Component(s)
collector
Describe the issue you're reporting
TL;DR - Add status conditions to the OpenTelemetryCollector resource so that on deployment native Kubernetes features can check health of custom resource knowing that it was deployed successfully. This would be prior to collector telemetry being exported.
At time of writing, a collector can be deployed, the pod get stuck in CrashLoopBackOff, the "Deployment" resource will show as unhealthy, but the OpenTelemetry Collector resource will show as healthy and deployed.
In the screenshot below, a collector can fail, and produce no telemetry (i purposefully broke the config so that it would get into this state). A Gitops tool that handles k8s syncing of resources (in this case argo) will view this as a "successful deployment" in that it Synced correctly, and the app health check against the custom resource was fine. Therefor, it's synced and viewed as healthy. You can see though that the underlying deployment does correctly reflect health, but it's not bubbled up. If it were to get bubbled up, the resource should have a set of status conditions like:
From there, you can perform health checks on sync/deployment, and then alert early on failure when there won't be telemetry from the latest deployment. See how other resource health checks work in this argo doc.