weaveworks-experiments / kspan

Turning Kubernetes Events into spans
Apache License 2.0
792 stars 56 forks source link

Synthesise Events from Condition transitions #2

Open bboreham opened 3 years ago

bboreham commented 3 years ago

Many objects have Conditions, like these on a Deployment:

  conditions:
  - lastTransitionTime: "2021-03-01T17:21:28Z"
    lastUpdateTime: "2021-03-01T17:21:28Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2020-11-24T13:37:06Z"
    lastUpdateTime: "2021-03-10T13:48:40Z"
    message: ReplicaSet "querier-6c88c56bbf" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing

We can create additional spans from the time these conditions are updated. Current code in this PR just creates zero-length spans same as we get with Events, but even better if we could put the start and end time of the span at the points where the transition goes to False then to True.

It works by starting a Watch() on any object we received an Event from, or anything in its owner chain, to see Conditions change over time. Still to do: drop the watch on objects that haven't done anything in a while.

We adjust timestamps from condition transitions that have arrived promptly, the same as we do for Events, to give a more fine-grained display in tracing.

Note the "Events" that are synthesised are just used as a convenient internal object to turn into spans; they aren't meant to look like real Events. For instance the ObjectMeta is populated with info useful for debugging.

The unit test for this feature builds on the capture/playback mechanism added in #32.

Example, from Cluster-API: image

pavolloffay commented 3 years ago

Is my understanding correct that this just adds a new "data-source" to create spans from?

bboreham commented 3 years ago

Conceptually, yes.

In some ways it should be better than Events, since we could get a better start/end timestamp. But in practice it seems we don’t always get each update, so we see a condition reset without seeing it set.

It’s unmerged since it never quite worked right.