open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
2.84k stars 2.23k forks source link

[exporter/elasticsearch] Duplicated data streams when using container name as index suffix #27590

Open mpostument opened 10 months ago

mpostument commented 10 months ago

Component(s)

exporter/elasticsearch, receiver/filelog

Describe the issue you're reporting

Hello, i am using filelog receiver with elasticsearch exporter. In elasticsearch exporter i have enabled dynamic_indexes

      elasticsearch/logv2:
        logs_index: otel-logs-
        user: $ELASTIC_USER_V2
        password: $ELASTIC_PASSWORD_V2
        logs_dynamic_index:
           enabled: true

In filelog log i am using container name as elasticsearch index suffix

          - id: extract_metadata_from_filepath
            parse_from: attributes["log.file.path"]
            regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
            type: regex_parser
          - type: copy
            from: resource["k8s.container.name"]
            to: attributes["elasticsearch.index.suffix"]

Most of indexes in elasticsearch are fine. But some of the pods is having random id container name: service-one-1696941203 service-one-1696942203 service-two-1696942203 service-two-1696941203 service-three-1696943203

And for those kind of pods i am getting separate data stream per pod. And within short amount in elastic hundreds of data streams created. How can i handle such cases using dynamic indexes?

github-actions[bot] commented 10 months ago

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

ycombinator commented 9 months ago

I would like to take a look at this issue this week.

ycombinator commented 8 months ago

Hi @mpostument I'm starting to work on this issue and would like to clarify something:

Most of indexes in elasticsearch are fine. But some of the pods is having random id container name: service-one-1696941203 service-one-1696942203 service-two-1696942203 service-two-1696941203 service-three-1696943203

And for those kind of pods i am getting separate data stream per pod. And within short amount in elastic hundreds of data streams created. How can i handle such cases using dynamic indexes?

Looking at your configuration, the elasticsearch exporter seems to be behaving as expected. It's taking the value of attributes["elasticsearch.index.suffix"] and appending it to the value of the logs_index setting, which is otel-logs-. So it makes sense why you are ending up with several indices like otel-logs-service-one-1696941203, otel-logs-service-one-1696942203, otel-logs-service-two-1696942203, otel-logs-service-two-1696941203, otel-logs-service-three-1696943203, etc.

What would expect or like to happen instead? Would you like all the data to go into a single index or data stream? If yes, could you disable or not use the logs_dynamic_index setting?

Apologies if I'm missing something obvious here. I'm a new contributor so that's quite possible. :)

mpostument commented 8 months ago

@ycombinator yes, that's right. I would want to write logs to three indexes, otel-logs-service-one, otel-logs-service-two and otel-logs-service-three. And ignore those id. Right now i am doing this in filelog receiver config. But with time this list growing and i need to manage every individual service

            from: resource["k8s.container.name"]
            to: attributes["elasticsearch.index.suffix"]
          - type: add
            field: attributes["elasticsearch.index.suffix"]
            value: service-one
            if: 'attributes["elasticsearch.index.suffix"] matches "^service-one-\\d+(?:-[a-zA-Z0-9]+)*$"'
          - type: add
            field: attributes["elasticsearch.index.suffix"]
            value: service-two
            if: 'attributes["elasticsearch.index.suffix"] matches "^service-two-\\d+$"'
          - type: add
            field: attributes["elasticsearch.index.suffix"]
            value: service-three
            if: 'attributes["elasticsearch.index.suffix"] matches "^service-three-\\d+$"'
ycombinator commented 8 months ago

Thanks for the clarification, @mpostument, that helps.

Forgive me again if I'm misunderstanding something because I'm still pretty new to OTel, but could you parse out the service-XXXX part from resource["k8s.container.name"] using the regex_parser operator and then assign it to the elasticsearch.index.suffix attribute using the copy operator like so?

- id: extract_service_name
  type: regex_parser
  regex: (?P<service_name>service-\w+)
  parse_from: resource["k8s.container.name"]
- type: copy
  from: attributes["service_name"]
  to: attributes["elasticsearch.index.suffix"]
mpostument commented 8 months ago

Yes, but this is basically what i am doing right now. Just a bit differently. But in my example i am using service_one, tow and three is just an example. Services has random names, with multiple dashes and etc like super-awesome-service-218392183 and etc. I was not able to build regex which will cover all of them

ycombinator commented 8 months ago

I see. I'm not sure it is Elasticsearch or the Elasticsearch exporter's responsibility to understand semantics of container names. In other words, I still think parsing out the desired index suffix is outside the scope of Elasticsearch or the Elasticsearch exporter.

Services has random names, with multiple dashes and etc like super-awesome-service-218392183 and etc. I was not able to build regex which will cover all of them

It sounds to me like you want to extract just the service name from the container name and use that extracted service name as the Elasticsearch index suffix. If so, there has to be some pattern that can be used to separate the service name from the rest of the container name. Could you post a variety of container names? Perhaps I will be able to come up with a pattern to extract the service name from them.

JaredTan95 commented 8 months ago

In filelog log i am using container name as elasticsearch index suffix

This sounds a little scary, but let's say there are 100 deployments with 3 instances per deployment. Then your inde count will be at least 100 * 3, and as Pods are repeatedly created, your index count will be even more terrifying.

I recommend not bringing in frequently changing values (such as deployment/pod name) in the index.

mpostument commented 8 months ago

@JaredTan95 what can you suggest to use as index name?

Right now i have index per service. Even if i run pod as daemonset i still have one index per app. Here is my full config of filelog receiver

@ycombinator service names are in this config

receivers:
  filelog:
    exclude:
    - /var/log/pods/observability_observability-v2-otel-daemonset*_*/opentelemetry-collector-daemonset/*.log
    include:
    - /var/log/pods/*/*/*.log
    include_file_name: false
    include_file_path: true
    operators:
    - id: get-format
      routes:
      - expr: body matches "^\\{"
        output: parser-docker
      - expr: body matches "^[^ Z]+ "
        output: parser-crio
      - expr: body matches "^[^ Z]+Z"
        output: parser-containerd
      type: router
    - id: parser-crio
      regex: ^(?P<time>[^ Z]+) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
      timestamp:
        layout: 2006-01-02T15:04:05.999999999Z07:00
        layout_type: gotime
        parse_from: attributes.time
      type: regex_parser
    - combine_field: attributes.log
      combine_with: ""
      id: crio-recombine
      is_last_entry: attributes.logtag == 'F'
      output: extract_metadata_from_filepath
      source_identifier: attributes["log.file.path"]
      type: recombine
    - id: parser-containerd
      regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
      timestamp:
        layout: '%Y-%m-%dT%H:%M:%S.%LZ'
        parse_from: attributes.time
      type: regex_parser
    - combine_field: attributes.log
      combine_with: ""
      id: containerd-recombine
      is_last_entry: attributes.logtag == 'F'
      output: extract_metadata_from_filepath
      source_identifier: attributes["log.file.path"]
      type: recombine
    - id: parser-docker
      output: extract_metadata_from_filepath
      timestamp:
        layout: '%Y-%m-%dT%H:%M:%S.%LZ'
        parse_from: attributes.time
      type: json_parser
    - id: extract_metadata_from_filepath
      parse_from: attributes["log.file.path"]
      regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
      type: regex_parser
    - from: attributes.stream
      to: attributes["log.iostream"]
      type: move
    - from: attributes.container_name
      to: resource["k8s.container.name"]
      type: move
    - from: attributes.namespace
      to: resource["k8s.namespace.name"]
      type: move
    - from: attributes.pod_name
      to: resource["k8s.pod.name"]
      type: move
    - from: attributes.restart_count
      to: resource["k8s.container.restart_count"]
      type: move
    - from: attributes.uid
      to: resource["k8s.pod.uid"]
      type: move
    - from: attributes.log
      to: body
      type: move
    - parse_from: body
      parse_to: attributes
      type: json_parser
    - from: resource["k8s.container.name"]
      to: attributes["elasticsearch.index.suffix"]
      type: copy
    - field: attributes["elasticsearch.index.suffix"]
      if: attributes["elasticsearch.index.suffix"] matches "pr-job-[0-9a-f]+"
      type: add
      value: pr-job
    - field: attributes["elasticsearch.index.suffix"]
      if: attributes["elasticsearch.index.suffix"] matches "^build-service-cnt-\\d+(?:-[a-zA-Z0-9]+)*$"
      type: add
      value: build-service-cnt
    - field: attributes["elasticsearch.index.suffix"]
      if: attributes["elasticsearch.index.suffix"] matches "^luzok-worker-static-file-migration-job-\\d+$"
      type: add
      value: luzok-worker-static-file-migration-job
    - field: attributes["elasticsearch.index.suffix"]
      if: attributes["elasticsearch.index.suffix"] matches "^luzok-worker-job-migration-db-\\d+$"
      type: add
      value: luzok-worker-job-migration-db
    - field: attributes["elasticsearch.index.suffix"]
      if: attributes["elasticsearch.index.suffix"] matches "^luzok-worker-job-migration-static-files-\\d+$"
      type: add
      value: luzok-worker-job-migration-static-files
    - field: attributes["elasticsearch.index.suffix"]
      if: attributes["elasticsearch.index.suffix"] matches "^luzok-worker-migration-job-db-\\d+$"
      type: add
      value: luzok-worker-job-migration-db
    start_at: beginning
    storage: file_storage
JaredTan95 commented 8 months ago

Index names should be trade-offs with your data volume and business. There is no absolute advice.

In our scenario, we store the logs of multiple k8s clusters in a unified elasticsearch cluster, so it is acceptable for us to include the k8s cluster id in the index name(otlp-logs-{k8s-cluster-id}). As a reference only.

mpostument commented 8 months ago

I tried similar approach, but in time we have data dropped because of we reach our limit in fields mapping. That's why i started to use index per app approach

github-actions[bot] commented 6 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

djaglowski commented 6 months ago

I am removing the filelog label since to me this does not appear to be a problem with the receiver but more a question of how to properly configure for this use case. Please feel free to tell me I'm wrong if I've missed something.

github-actions[bot] commented 4 months ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] commented 1 month ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.