open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.08k stars 2.37k forks source link

[receiver/k8sevents] Deprecate the k8seventsreceiver #24242

Open TylerHelmuth opened 1 year ago

TylerHelmuth commented 1 year ago

Component(s)

receiver/k8sevents, receiver/k8sobjects

Describe the issue you're reporting

With the introduction of the k8sobjectsreceiver the purpose of the k8seventsreceiver is now handle by a more generic component. I propose we deprecate the k8seventsreceiver in favor of the k8sobjectsreceiver.

Work to do:

github-actions[bot] commented 1 year ago

Pinging code owners for receiver/k8sevents: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

Pinging code owners for receiver/k8sobjects: @dmitryax @hvaghani221. See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 year ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

TylerHelmuth commented 1 year ago

Comparing events collected by k8sobjectsreceiver and k8seventsreceiver the biggest gap we'll need to fill is that the k8sobjectsreceiver is pretty ungraceful with the way it formats the data. The k8sobjectsreceiver puts the json event into the body of the log whereas the k8seventsreceiver attempts to build a more wholistic otlp log based on the event. Specifically the k8seventsreceiver does this:

// k8sEventToLogRecord converts Kubernetes event to plog.LogRecordSlice and adds the resource attributes.
func k8sEventToLogData(logger *zap.Logger, ev *corev1.Event) plog.Logs {
    ld := plog.NewLogs()
    rl := ld.ResourceLogs().AppendEmpty()
    sl := rl.ScopeLogs().AppendEmpty()
    lr := sl.LogRecords().AppendEmpty()

    resourceAttrs := rl.Resource().Attributes()
    resourceAttrs.EnsureCapacity(totalResourceAttributes)

    resourceAttrs.PutStr(semconv.AttributeK8SNodeName, ev.Source.Host)

    // Attributes related to the object causing the event.
    resourceAttrs.PutStr("k8s.object.kind", ev.InvolvedObject.Kind)
    resourceAttrs.PutStr("k8s.object.name", ev.InvolvedObject.Name)
    resourceAttrs.PutStr("k8s.object.uid", string(ev.InvolvedObject.UID))
    resourceAttrs.PutStr("k8s.object.fieldpath", ev.InvolvedObject.FieldPath)
    resourceAttrs.PutStr("k8s.object.api_version", ev.InvolvedObject.APIVersion)
    resourceAttrs.PutStr("k8s.object.resource_version", ev.InvolvedObject.ResourceVersion)

    lr.SetTimestamp(pcommon.NewTimestampFromTime(getEventTimestamp(ev)))

    // The Message field contains description about the event,
    // which is best suited for the "Body" of the LogRecordSlice.
    lr.Body().SetStr(ev.Message)

    // Set the "SeverityNumber" and "SeverityText" if a known type of
    // severity is found.
    if severityNumber, ok := severityMap[strings.ToLower(ev.Type)]; ok {
        lr.SetSeverityNumber(severityNumber)
        lr.SetSeverityText(ev.Type)
    } else {
        logger.Debug("unknown severity type", zap.String("type", ev.Type))
    }

    attrs := lr.Attributes()
    attrs.EnsureCapacity(totalLogAttributes)

    attrs.PutStr("k8s.event.reason", ev.Reason)
    attrs.PutStr("k8s.event.action", ev.Action)
    attrs.PutStr("k8s.event.start_time", ev.ObjectMeta.CreationTimestamp.String())
    attrs.PutStr("k8s.event.name", ev.ObjectMeta.Name)
    attrs.PutStr("k8s.event.uid", string(ev.ObjectMeta.UID))
    attrs.PutStr(semconv.AttributeK8SNamespaceName, ev.InvolvedObject.Namespace)

    // "Count" field of k8s event will be '0' in case it is
    // not present in the collected event from k8s.
    if ev.Count != 0 {
        attrs.PutInt("k8s.event.count", int64(ev.Count))
    }

    return ld
}

To generalize the function, the k8seventsreceiver takes interesting parts of the log and intelligently adds them to the resource, body, attributes, and severity. We will need to update the k8sobjectsreceiver to be this intelligent as well.

dmitryax commented 1 year ago

k8sobjects receiver was designed to send raw k8s objects because the OTel logs/events Spec SIG came to a conclusion to use external schemas for this kind of 3rd-party events instead of defining OTel schemas and semantic conventions for any event coming from external sources. It was decided on one of the calls. That why the k8s objects receiver was introduced to replace k8s events receiver. If we want to change it, we should start with the spec.

Also, I heard complaints from the k8s cluster users about high cardinality attributes.

TylerHelmuth commented 1 year ago

@dmitryax do you know if there was any OTEP or other documentation that solidified that schema decision?

Can resource attributes still be set such as k8s.pod.name using the Event's Regarding fields?

TylerHelmuth commented 1 year ago

@dmitryax reviewing https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/event-api.md and https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/logs/semantic_conventions/events.md I don't see anything stopping us from proposing meaningful attributes and resource attributes for kubernetes events like the k8seventsreceiver implements.

dmitryax commented 1 year ago

I stopped attending the Log Spec SIG calls, but the last time I was there, there was an agreement to use log body for the event body and define event.domain and event.name to identify the third-party schema. @scheler, can you please update us on how this changed since that?

TylerHelmuth commented 1 year ago

@dmitryax with https://github.com/open-telemetry/opentelemetry-specification/issues/3681 close we can move forward with deciding which resource attributes the k8sobjectsreceiver should extract.

I propose we start with these resource attributes:

k8s.node.name             <- object.reportingInstance
k8s.namespace.name        <-  object.regarding.namespace
k8s.{regarding.kind}.name <- object.regarding.name
k8s.{regarding.kind}.uid  <- object.regarding.uid

Personally I would like to extract more, but I don't think there are any other preexisting k8s resource attributes for us extract.

Additionally we could consider using object.type to set the log's severity.

dmitryax commented 1 year ago

@TylerHelmuth, the suggested list sounds reasonable to me. We can start with this 👍

jinja2 commented 1 year ago

k8s.node.name <- object.reportingInstance

k8s spec notes reportIngInstance as an ID of the reportingController which might not be same as the k8s node name. I think we should leave this one out.

k8s.namespace.name <- object.regarding.namespace k8s.{regarding.kind}.name <- object.regarding.name k8s.{regarding.kind}.uid <- object.regarding.uid

Events are available in 2 api groups, core/v1 and the newer events.k8s.io/v1. The regarding field is available only in events.k8s.io/v1 api and looking at the receiver, I think unless the user explicitly setsgroup = events.k8s.io/v1 in the object config, the receiver will query events in v1 version which has the required info in involvedObject field. If we are adding new attrs, I think it makes sense for these to be available consistently for either api version.

TylerHelmuth commented 1 year ago

@jinja2 great insight as always. I will update the PR to remove the node name and handle both event versions.

sathieu commented 1 year ago

@TylerHelmuth Would you also change object.type (Normal or Warning) to a severity ? And maybe add a label (k8s.event.type).

Also I'm wondering if object.reason may be added as a label (k8s.event.reason).

TylerHelmuth commented 1 year ago

I would love to move reason to an attribute (and note), but I think we need a defined semantic convention first.

I think doing the same logic as k8seventsreceiver to set severity number and severity text is reasonable. @dmitryax @jinja2 do you agree?

jinja2 commented 1 year ago

To clarify, the changes being proposed to event records in k8sobjects receiver are as below(?) -

  1. SeverityText = event.type (normal or warning)
  2. SeverityNumber = SeverityNumberInfo(normal) or SeverityNumberWarn(warning)
  3. Attr(k8s.event.reason) = event.reason
  4. Attr(k8s.event.note) = event.note

The first 3 look reasonable to me. No. 4 though, I don't think it should be an attribute. The note field seems more appropriate in the body of logrecord. Additionally, is there a limit on the size of attribute values in the spec? I think I read it is 255 somewhere, but can't find it now. The note field in a k8s event can be upto 1k in length.

Can we introduce these enhancements behind a config flag in the k8sobjects receiver? Gives user the option to transform the events as they wish w/o having to understand what additions we are making, in case they don't want to follow our convention.

I agree that we should hash out the details of what gets extracted out of the events further. I'll take a look at the current implementation in k8sevents receiver sometime this week and think more on what is worth porting over into k8sobjects recv.

TylerHelmuth commented 1 year ago

@jinja2 as much as I want 3 and 4, I think we should only stick with 1 and 2 for now. I think 3 and 4 need more concrete acceptance from the otel semantic convention or the k8s event schema, but neither recognize k8s.event.reason or k8s.event.note.

TylerHelmuth commented 1 year ago

@jinja2 do you want to see the existing semantic convention conversions behind a config option as well?

jinja2 commented 1 year ago

@TylerHelmuth I think a flag for the existing conversion would be good so we don't interfere with any existing transformations users might have setup already. Wdyt?

TylerHelmuth commented 1 year ago

Since we have ways in other components to turn off other automatic resource attributes it is a good idea. Eventually I'd like it to be turned on by default, but we can follow our breaking change process to get there.

jinja2 commented 1 year ago

@TylerHelmuth I missed the earlier conversation from this issue which states we are trying to extract attributes defined in k8s convention.

With this context in mind, these 2 extractions

k8s.{regarding.kind}.name <- object.regarding.name
k8s.{regarding.kind}.uid <- object.regarding.uid

will introduce some new attributes which are not mentioned explicitly in the convention. Jfyi, events can also be generated for custom resources by external controllers. So regarding.kind can be a lot of different types (k8s native and custom resources).

For e.g., this is the list of object kinds with events from one of my test k8s cluster. The last kind is a custom resource maintained by an external AWS controller.

> kc get ev -A -o=json | jq '[.items[].involvedObject.kind] | unique '
[
  "ConfigMap",
  "CronJob",
  "DaemonSet",
  "Deployment",
  "HorizontalPodAutoscaler",
  "Job",
  "Node",
  "Pod",
  "PodDisruptionBudget",
  "ReplicaSet",
  "Service",
  "StatefulSet",
  "TargetGroupBinding"
]

While I don't think we should have to explicitly list all the kinds in the convention, I do see value in adding a generic section to document the recommendation that telemetry for k8s object types not explicitly noted in the convention are recommended to have attributes k8s.{k8s.object.kind}.name and k8s.{k8s.object.kind}.uid where k8s.object.kind is lowercase value of the k8s object's kind. This addition to convention might in turn warrant updates to other k8s receivers which might not be using the kind name for objects in attribute. One such discrepancy from the top of my head is between the k8scluster receiver using hpa (k8s.hpa.uid attr) instead of horizontalpodautoscaler which is what we'll see in records from the k8sobjects receiver for hpa events.

jinja2 commented 1 year ago

I can start a different issue to discuss the convention updates if this sounds like something we'd like to look into further.

TylerHelmuth commented 1 year ago

@jinja2 love it, we should definitely bring this up to the semantic convention SIG.

TylerHelmuth commented 1 year ago

Opened https://github.com/open-telemetry/semantic-conventions/issues/430 to discuss the name and uid semantic conventions for k8s objects.

github-actions[bot] commented 11 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.