open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.9k stars 2.27k forks source link

k8sattributes doesn't work with both pod UID and container name in sources #32104

Open paulius-valiunas opened 5 months ago

paulius-valiunas commented 5 months ago

Component(s)

processor/k8sattributes

What happened?

Description

If I provide pod UID, container name, and namespace name, to the k8sattributes processor, it doesn't seem to work.

Steps to Reproduce

  k8sattributes/filelog:
    pod_association:
      - sources:
        - from: resource_attribute
          name: k8s.pod.uid
        - from: resource_attribute
          name: k8s.container.name
        - from: resource_attribute
          name: k8s.namespace.name
    extract:
      metadata:
        - k8s.node.name
        - k8s.pod.uid
        - k8s.container.name
        - k8s.namespace.name

Expected Result

Should add k8s.node.name to the resource

Actual Result

Doesn't add anything. Doesn't log anything useful either. With debug logs enabled:

2024-03-29T16:27:32.474Z    debug   k8sattributesprocessor@v0.88.0/processor.go:108 evaluating pod identifier   {"kind": "processor", "name": "k8sattributes/filelog", "pipeline": "logs", "value": [{"Source":{"From":"resource_attribute","Name":"k8s.pod.uid"},"Value":"814b693a-dbc8-44aa-b7ad-22f919e5569c"},{"Source":{"From":"resource_attribute","Name":"k8s.container.name"},"Value":"pod-619863854748671e213bb333e56981de795eeeb0"},{"Source":{"From":"resource_attribute","Name":"k8s.namespace.name"},"Value":"imodeljs"},{"Source":{"From":"","Name":""},"Value":""}]}

After removing k8s.container.name from sources, the processor started adding the node name to my resources.

Collector version

OpenTelemetry Operator 0.42.3

Environment information

Environment

OS: (e.g., "Ubuntu 20.04") Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

No response

Log output

No response

Additional context

No response

github-actions[bot] commented 5 months ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

swiatekm commented 5 months ago

I think this is because k8s.container.name is not actually added to pod attributes that association can check. If so, this should either be documented, or ideally should result in an error.

paulius-valiunas commented 5 months ago

Here's an excerpt from the documentation:

Additional container level attributes can be extracted provided that certain resource attributes are provided:

  1. If the container.id resource attribute is provided, the following additional attributes will be available:
    • k8s.container.name
    • container.image.name
    • container.image.tag
  2. If the k8s.container.name resource attribute is provided, the following additional attributes will be available:
    • container.image.name
    • container.image.tag
  3. If the k8s.container.restart_count resource attribute is provided, it can be used to associate with a particular container instance. If it's not set, the latest container instance will be used:
    • container.id (not added by default, has to be specified in metadata)

This led me to believe that k8s.container.name is supported, and that I can add it in addition to the other attributes. Sounds like that's incorrect and the README should be updated...

TylerHelmuth commented 5 months ago

@paulius-valiunas condition 1. states that k8s.container.name can be extracted as long as the container.id resource attribute is already present. Does your data have the resource attribute container.id?

This feels like a possible duplicate of https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31509, do you agree?

TylerHelmuth commented 5 months ago

If you're trying to associate the telemetry to pod data using k8s.container.name it must be present in the resource attributes already - you cannot use the processor to add k8s.container.name and use it for association at the same time.

swiatekm commented 5 months ago

@TylerHelmuth I don't think this works either way. If you look at the implementation of Pod cache updates, it goes like so:

  1. Extract Pod attributes (but not container attributes) and put them in the Attributes field.
  2. Extract container data (but not attributes, all the data) and put it in the Containers field.
  3. Determine Pod identity based on association configuration. This can use arbitrary attributes, but only ones in the Attributes field.
  4. Save Pod data under all the determined identities.

Container attributes are not calculated during this process, but instead at runtime based on the Containers field: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/c38ec5cab617bd7a83b82d66be69c62019da6987/processor/k8sattributesprocessor/processor.go#L147.

paulius-valiunas commented 5 months ago

@paulius-valiunas condition 1. states that k8s.container.name can be extracted as long as the container.id resource attribute is already present. Does your data have the resource attribute container.id?

This feels like a possible duplicate of #31509, do you agree?

I'm not trying to extract the container name, I already have it. I'm just trying to get the image name and tag.

paulius-valiunas commented 5 months ago

Also, the documentation says:

Only attribute names from metadata should be used for pod_association's resource_attribute, because empty or non-existing values will be ignored.

That's why I added the "already known" fields to both metadata and pod_association. Seems like this sentence is incorrect or in need of clarification, because I later removed everything except node name from metadata and it still worked.

TylerHelmuth commented 5 months ago

@swiatekm-sumo you're right. We should add to the README that container fields can't be used for association.

Seems like this sentence is incorrect or in need of clarification, because I later removed everything except node name from metadata and it still worked.

Yes I agree, I think that sentence is wrong and could be removed.

I'm not trying to extract the container name, I already have it. I'm just trying to get the image name and tag.

Similar to https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31509#issuecomment-1999690606, can you check that the container name on the telemetry matches the container name the processor is recording? If they don't match the other container details cannot be extracted.

I also suggest you make the pod_association section look like this:

    pod_association:
    - sources:
      - from: resource_attribute
        name: k8s.pod.ip
    - sources:
      - from: resource_attribute
        name: k8s.pod.uid
    - sources:
      - from: connection

As @swiatekm-sumo pointed out k8s.container.name won't work as an association and k8s.namespace.name isn't unique to a pod.

paulius-valiunas commented 5 months ago

We should add to the README that container fields can't be used for association.

In that case, you should remove this section too:

  1. If the k8s.container.name resource attribute is provided, the following additional attributes will be available:
    • container.image.name
    • container.image.tag

If I may ask, how could you possibly get container.image.name using this processor?

Similar to #31509 (comment), can you check that the container name on the telemetry matches the container name the processor is recording? If they don't match the other container details cannot be extracted.

Yes, I'm getting the correct container name, everything comes from Kubernetes log files.

I also suggest you make the pod_association section look like this:

    pod_association:
    - sources:
      - from: resource_attribute
        name: k8s.pod.ip
    - sources:
      - from: resource_attribute
        name: k8s.pod.uid
    - sources:
      - from: connection

I'm getting all my logs from a filelog receiver so the connection/IP-related sources won't work.

As @swiatekm-sumo pointed out k8s.container.name won't work as an association and k8s.namespace.name isn't unique to a pod.

You have a point, k8s.namespace.name isn't unique to a pod, it's only there because I was previously using it in combination with k8s.pod.name. Only the combination of both is unique. I later replaced k8s.pod.name with k8s.pod.uid but didn't remove the namespace.

TylerHelmuth commented 5 months ago

@paulius-valiunas while container fields do nothing in the pod_association section, they have meaning in the extract.metadata section.

This section of the README is accurate:

Additional container level attributes can be extracted provided that certain resource attributes are provided:

1. If the `container.id` resource attribute is provided, the following additional attributes will be available:
   - k8s.container.name
   - container.image.name
   - container.image.tag
2. If the `k8s.container.name` resource attribute is provided, the following additional attributes will be available:
   - container.image.name
   - container.image.tag
3. If the `k8s.container.restart_count` resource attribute is provided, it can be used to associate with a particular container
   instance. If it's not set, the latest container instance will be used:
   - container.id (not added by default, has to be specified in `metadata`)

Although I think we should update the word provided to something clearer. The README is trying to convey that the listed attributes, when configured in extract.metadata, can only be extracted when the named resource attribute is present in the telemetry's resource.

In your situation, you''ll be able to associate the pod logs you're collecting via the filelog receiver to the pod metadata via k8s.pod.uid. If you're using the Collector Helm Chart logsCollection preset, the filelogreceiver configuration will extract this resource attribute for you: https://github.com/open-telemetry/opentelemetry-helm-charts/blob/abe4c7dc5a5f384d5f847a0aa7bfeb12dca9ef90/charts/opentelemetry-collector/templates/_config.tpl#L276-L278 (as well as the k8s.container.name.

Your k8sattributesprocessor config can look like this:

  k8sattributes/filelog:
    pod_association:
      - sources:
        - from: resource_attribute
          name: k8s.pod.uid
    extract:
      metadata:
        - "container.image.name"
        - "container.image.tag"
paulius-valiunas commented 5 months ago

Yes, the documentation needs to be a lot clearer. What exactly does "resource attributes are provided" mean? Do they have to be in the resource but not referenced by pod_association?

In addition, I don't think "[they] do nothing in the pod_association section" is accurate - if you include them, the association doesn't work anymore, so they do something, just not what you'd expect.

Overall, some things might be hard to explain in words, but it would be great to have more examples in the documentation.

github-actions[bot] commented 3 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

bacherfl commented 2 months ago

@TylerHelmuth if this issue is still relevant I would be happy to look into this

bacherfl commented 2 months ago

I have created a PR for the readme in an attempt to make things a bit clearer and highlight potential limitations for association rules: https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33800 please let me know if there is something you think needs further elaboration

github-actions[bot] commented 2 weeks ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.