open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.96k stars 2.3k forks source link

Reuse add_metadata_from_filepath #33912

Closed zeitlinger closed 2 months ago

zeitlinger commented 3 months ago

Component(s)

pkg/stanza, receiver/filelog

Is your feature request related to a problem? Please describe.

add_metadata_from_filepath is tied to the container operator.

Describe the solution you'd like

Would it make sense to move this feature to file consumer, so that it can be reused, e.g. in OTLP JSON File Receiver?

Describe alternatives you've considered

Could be done manually - if operators were supported for OTLP JSON File Receiver:

otlpjsonfile:
        include:
          - /var/log/pods/*/anti-fraud/*.log
        include_file_path: true
        operators:
          - id: extract_metadata_from_filepath
            type: regex_parser
            cache:
              size: 128
            parse_from: attributes["log.file.path"]
            regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
          - from: attributes.restart_count
            to: resource["k8s.container.restart_count"]
            type: move
          - from: attributes.uid
            to: resource["k8s.pod.uid"]
            type: move
          - from: attributes.container_name
            to: resource["k8s.container.name"]
            type: move
          - from: attributes.namespace
            to: resource["k8s.namespace.name"]
            type: move
          - from: attributes.pod_name
            to: resource["k8s.pod.name"]
            type: move

Additional context

No response

github-actions[bot] commented 3 months ago

Pinging code owners:

ChrsMark commented 3 months ago

add_metadata_from_filepath is tied to the container operator for the reason that it's specific for the target technology: container logs from known runtimes which are written to known log file paths (usually that's the case but that's not guaranteed, that's why this is an optional setting which can be disabled).

So generalizing this, as is, for the entire file consumer sounds a bit weird unless we explicitly name it to something like add_metadata_from_pod_filepath. However, I'm not sure if that's a good practice since that might open the door for "packaging" into code patterns that can be supported through configuration. Do we want to potentially support (and encode) more "patterns" in the future? And if so what would be the acceptance criteria for this? More or less what was mentioned at https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/31959#issuecomment-2023862536. Happy to hear others' opinions though.

Btw, how the log entries look like after they are parsed? I guess they are not wrapped in a container format, right?

zeitlinger commented 3 months ago

How would you feel about adding add_metadata_from_pod_filepath to OTLP JSON File Receiver?

zeitlinger commented 3 months ago

Btw, how the log entries look like after they are parsed? I guess they are not wrapped in a container format, right?

you mean for OTLP JSON File Receiver? There, they are wrapped in a container format - I've created https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33846 to deal with that

djaglowski commented 2 months ago

How about a pod_path_parser which, given a field (e.g. parse_from: attributes["log.file.path"]) and optionally a type (e.g. pod_type: docker) will parse values according to the expected format?

zeitlinger commented 2 months ago

How about a pod_path_parser which, given a field (e.g. parse_from: attributes["log.file.path"]) and optionally a type (e.g. pod_type: docker) will parse values according to the expected format?

sound good :smile:

what is pod type and what are possible values?

djaglowski commented 2 months ago

what is pod type and what are possible values?

Sorry, not the correct name. Maybe container_type or format is better?

I was thinking it would behave the same as the container parser's format setting, which accepts docker, crio, containerd, or if left blank will automatically detect the format (auto-detection is less efficient due to the necessary regex matching).

ChrsMark commented 2 months ago

I like the idea of the path parser. I think that with some refactoring we could re-use what the container operator implements so as we provide the same functionality as part of the container operator along with the standalone operator as well. Similarly to what we do with the recombine operator.

Regarding scope/naming: the container operator can potentially handle docker logs written on the disk regardless of them being orchestrated by K8s or not. So it's not 100% tied to Pods. In this, I believe the path parser should also focus on container level as well and be called container_path_parser or sth similar.

Then as far as the config option is concerned, I don't think that the format or the runtime really fits here. We can have the 3 different runtimes/formats logging at /var/log/pod/* in K8s envs (usually defined by the CRI). That's actually the case that the container operator embeds. Hence I guess we should distinguish cases based on the environment/log_file_path_type/managed_by:

  1. K8s/CRI -> /var/log/pods/*/anti-fraud/*.log
  2. Docker -> /var/lib/docker/containers/08d1c419f017398a6587fcc858b75ac09875e07213964a84c2e351012a4899f8/08d1c419f017398a6587fcc858b75ac09875e07213964a84c2e351012a4899f8-json.log (which includes the container.id as it comes from the runtime)
  3. Any other case that might occur in the future?

As it was already mentioned the type can be detected in auto mode or defined directly.

I don't have any strong preference on the naming here, I would just suggest type: {k8s-cri, docker}.

zeitlinger commented 2 months ago

great

I tried to add the functionality to https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/2f079f94462ff6aa93fd95989ee8ea89e61a526a/pkg/stanza/fileconsumer/attrs/attrs.go#L32 - but I can't distinguish between regular and resource attributes there.

attributes map[entry.FieldInterface]any is not allowed by go - but I could do map[string]FieldEntry with

type FieldEntry struct {
    Key   FieldInterface
    Value any
}

ideas?

djaglowski commented 2 months ago

I tried to add the functionality to

opentelemetry-collector-contrib/pkg/stanza/fileconsumer/attrs/attrs.go

I don't think this is what we were suggesting. The idea for adding a new parser means a new operator which is independent of fileconsumer. Maybe the closest example would be the uri parser.

zeitlinger commented 2 months ago

OK - should operators be supported by json otlp receiver then?

ChrsMark commented 2 months ago

Interesting, for some reason I took for granted that operators are also part of the otlpjsonfilereceiver at first place 😞. I'm not familiar with the otlpjsonfilereceiver so I'm not sure if operators would fit there in general. We should ensure that operators would be used in general as part of this receiver and not just rely on this specific use-case. (@atoulme please chime in if you have feedback on this)

On the other hand I'm not sure if we could do sth similar to the Header operators which are part of fileconsumer: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/fileconsumer/config.go#L94 That also feels weird because the operation we need here is not strictly coupled to the content of the file I think. @djaglowski did you have sth else in mind that we miss here maybe?

Based on the above I wonder if this specific feature should come through an ottl function since it seems to not be tightly coupled to the various receivers used to parse the content itself.

djaglowski commented 2 months ago

I forgot that operators are not supported in this receiver, but it makes sense really. The purpose of the receiver is to read in fully formatted telemetry data. It's just a mechanism for transmission really.

With that in mind, I don't care if this functionality is implemented as an operator, but I do think it should happen outside of fileconsumer. It would make sense to me as a reuseable function or package shared by the container parser and this, and perhaps later a container path parser.

ChrsMark commented 2 months ago

@djaglowski @zeitlinger shall we close this one? That was mostly needed for https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33846 but I think that should also be fine now after the otlpjson connector.

ChrsMark commented 2 months ago

Closing since https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/33846 was also closed. We can re-open if there is a specific use-case for this.

zeitlinger commented 2 months ago

@djaglowski @zeitlinger shall we close this one? That was mostly needed for #33846 but I think that should also be fine now after the otlpjson connector.

yes, the current solution is much better :smile: