[BUG] Opensearch dynamic index not working with nested fields

opensearch-project / data-prepper

OpenSearch Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.

https://opensearch.org/docs/latest/clients/data-prepper/index/

Apache License 2.0

265 stars 203 forks source link

[BUG] Opensearch dynamic index not working with nested fields #2259

Open skylagallm opened 1 year ago

skylagallm commented 1 year ago

Describe the bug Using the latest 2.1.0 version, with Opensearch dynamic index, Data prepper is not working when trying to reference a nested field inside the event.

To Reproduce Steps to reproduce the behavior:

Add an attribute to each span in otel collector:

processors:
resource:
attributes:
- key: tenant
  value: "test"
  action: insert

otel will insert this custom key as:

resource.attributes.tenant

In data prepper configure Opensearch sink to use the nested fie

  sink:
    - opensearch:
        hosts: ["https://endpoint.eu-west-1.es.amazonaws.com:443"]
        aws_sigv4: true
        aws_region: "eu-west-1"
        index: "otel-v1-apm-span-${resource.attributes.tenant}-%{yyyy.MM.dd}"
        index_type: "custom"
        trace_analytics_raw: true

Expected behavior A clear and concise description of what you expected to happen.

I would expect Data prepper to create an index named otel-v1-apm-span-test-2023.02.09, but it doesn't. By the way, If I try to use a top level field like "serviceName" all is working fine.

dlvenable commented 1 year ago

@skylagallm , Data Prepper uses JSON Pointer syntax. So using it uses slashes instead of dots.

In your example, ${resource.attributes.tenant} will look for a top-level field with the name resource.attributes.tenant. You might want to change to use ${/resource/attributes/tenant}. Please let us know if this helps or if you have other issues with your use-case.

skylagallm commented 1 year ago

Hi divenable, thanks for your support.

Probably it's my fault, I have not explained well the scenario. Here below one example of a trace being produced:

  {
    "_index": "otel-v1-apm-span-prod-2023.02.10",
    "_type": "_doc",
    "_id": "_zG9O4YBsiGrimWmpeEN",
    "_score": 7.8146687,
    "_source": {
      "traceId": "0000000000000000010ab281baab0603",
      "droppedLinksCount": 0,
      "kind": "SPAN_KIND_SERVER",
      "droppedEventsCount": 0,
      "traceGroupFields": {
        "endTime": "2023-02-10T14:32:07.924255Z",
        "durationInNanos": 42000,
        "statusCode": 0
      },
      "traceGroup": "HTTP GET /config",
      "serviceName": "frontend",
      "parentSpanId": "",
      "spanId": "010ab281baab0603",
      "traceState": "",
      "name": "HTTP GET /config",
      "startTime": "2023-02-10T14:32:07.924213Z",
      "links": [],
      "endTime": "2023-02-10T14:32:07.924255Z",
      "droppedAttributesCount": 0,
      "durationInNanos": 42000,
      "events": [],
      "span.attributes.http@url": "/config?nonse=0.6082269370660489",
      "resource.attributes.client-uuid": "6fb38d634aad1c4d",
      "resource.attributes.host@name": "ip-10-149-171-188.prod.XXXXX.aws",
      "resource.attributes.service@name": "frontend",
      "span.attributes.component": "net/http",
      "status.code": 0,
      "span.attributes.sampler@param": true,
      "span.attributes.http@method": "GET",
      "resource.attributes.ip": "10.149.171.188",
      "resource.attributes.opencensus@exporterversion": "Jaeger-Go-2.30.0",
      "resource.attributes.obs-tenant": "my-application",
      "span.attributes.http@status_code": 200,
      "span.attributes.sampler@type": "const"
    }
  }

As you can see, the interested field is a top-level field, using dots inside the name.

By the way I have also made a test right now with the following config:

entry-pipeline:
  delay: "100"
  source:
    otel_trace_source:
      ssl: false
  sink:
    - pipeline:
        name: "raw-pipeline"
    - pipeline:
        name: "service-map-pipeline"
raw-pipeline:
  source:
    pipeline:
      name: "entry-pipeline"
  processor:
    - otel_trace_raw:
  sink:
    - opensearch:
        hosts: ["https://ENDPOINTeu-west-1.es.amazonaws.com:443"]
        aws_sigv4: true
        aws_region: "eu-west-1"
        index: "otel-v1-apm-span-${/resource/attributes/obs-tenant}-%{yyyy.MM.dd}"
        index_type: "custom"
        trace_analytics_raw: true
service-map-pipeline:
  workers: 8
  delay: "100"
  source:
    pipeline:
      name: "entry-pipeline"
  processor:
    - service_map_stateful:
        # The window duration is the maximum length of time the data prepper stores the most recent trace data to evaluvate service-map relationships.
        # The default is 3 minutes, this means we can detect relationships between services from spans reported in last 3 minutes.
        # Set higher value if your applications have higher latency.
        window_duration: 180
  buffer:
      bounded_blocking:
         # buffer_size is the number of ExportTraceRequest from otel-collector the data prepper should hold in memeory.
         # We recommend to keep the same buffer_size for all pipelines.
         # Make sure you configure sufficient heap
         # default value is 512
         buffer_size: 512
         # This is the maximum number of request each worker thread will process within the delay.
         # Default is 8.
         # Make sure buffer_size >= workers * batch_size
         batch_size: 8
  sink:
    - opensearch:
        hosts: ["https://ENDPOINT.eu-west-1.es.amazonaws.com:443"]
        aws_sigv4: true
        aws_region: "eu-west-1"
        index_type: trace-analytics-service-map

By the way this is not working.

graytaylor0 commented 1 year ago

Hi @skylagallm,

If you are using an Amazon OpenSearch domain, then you also need to provide auth (username and password if FGAC is enabled with username password) or an aws_sts_role_arn that has permissions to write to the domain.

The other opensearch sink configuration you have is incorrect. The boolean value of trace_analytics_raw was remove in data prepper 2.0. It is now specified in the form of index_type: trace-analytics-raw. So you can either configure with custom index_type like you've done (and remove the trace_analytics_raw boolean, or you use index_type of trace-analytics-raw (not both). It seems your use case would prefer to use the custom index, but I will note that if you would like to use the OpenSearch Dashboards Trace Analytics plugin, not using the built-in index_type will run into issues.

As far as your use case with the dynamic indices, I have locally tested and confirmed that this configuration works and creates an index named test_test_value_one. Note that this is when the top-level key value is the literal resource.attributes.tenant

log-pipeline:
  source:
    log_generator:
      total_log_count: 1
      log_type:
        apache_clf:
  processor:
    - add_entries:
        entries:
          - key: "/resource.attributes.tenant"
            value: "test_value_four"
    - date:
        from_time_received: true
        destination: "@timestamp"
  sink:
    - opensearch:
        hosts: [ "https://localhost:9200" ]
        index: "test_${resource.attributes.tenant}"
        insecure: true
        username: "admin"
        password: "admin"

The index test_value_four was created in this case, and the document in OpenSearch looks like

{
  "_index": "test_test_value_four",
  "_type": "_doc",
  "_id": "gEI1TIYBUY0COHOjueUw",
  "_version": 1,
  "_score": null,
  "_source": {
    "message": "51.120.173.23 agapetus xUwuOhfd [02/Jun/2022:14:51:38 -0500] \"GET /list HTTP/1.0\" 500 4777",
    "resource.attributes.tenant": "test_value_four",
    "@timestamp": "2023-02-13T13:17:10.961-06:00"
  },
  "fields": {
    "@timestamp": [
      "2023-02-13T19:17:10.961Z"
    ]
  },
  "sort": [
    1676315830961
  ]
}

I have also confirmed that setting resource.attributes.tenant as a nested key also works

log-pipeline:
  source:
    log_generator:
      total_log_count: 1
      log_type:
        apache_clf:
  processor:
    - add_entries:
        entries:
          - key: "/resource/attributes/tenant"
            value: "test_value_three"
    - date:
        from_time_received: true
        destination: "@timestamp"
  sink:
    - opensearch:
        hosts: [ "https://localhost:9200" ]
        index: "test_${resource/attributes/tenant}"
        insecure: true
        username: "admin"
        password: "admin"

The index created in OpenSearch was named test_test_value_three, and the document looks like this

{
  "_index": "test_test_value_three",
  "_type": "_doc",
  "_id": "X0IxTIYBUY0COHOjtOWA",
  "_version": 1,
  "_score": null,
  "_source": {
    "message": "159.106.50.144 volodislavu 5iItfsXu [11/Nov/2022:07:12:57 -0600] \"PUT /apps/cart.jsp?appID= HTTP/1.0\" 200 4963",
    "resource": {
      "attributes": {
        "tenant": "test_value_three"
      }
    },
    "@timestamp": "2023-02-13T13:12:47.773-06:00"
  },
  "fields": {
    "@timestamp": [
      "2023-02-13T19:12:47.773Z"
    ]
  },
  "sort": [
    1676315567773
  ]
}

So whether the key is top-level or nested, your use case should be covered. It might help if you were able to share the json Event that is input to data prepper, or by just debugging to the stdout sink to verify that the key you are looking to inject into the index exists as expected

skylagallm commented 1 year ago

Hi @graytaylor0,

Thanks for your effort in trying to replicate locally the behaviour; btw now I am a bit confused. I will try to summarize my findings:

Let's say that this (as returned by the stdout sink, without any processing) is the input of my trace/service map pipeline:

{
    "traceId": "63eba5dd078286ac04eead2dc53a34f6",
    "droppedLinksCount": 0,
    "kind": "SPAN_KIND_INTERNAL",
    "droppedEventsCount": 0,
    "traceGroupFields": {
        "endTime": null,
        "durationInNanos": null,
        "statusCode": null
    },
    "traceGroup": null,
    "serviceName": "guess-the-number",
    "parentSpanId": "a13d68b8efdbe850",
    "spanId": "9f208ffd9f5c9406",
    "traceState": "",
    "name": "jinja2.render",
    "startTime": "2023-02-14T15:16:45.588937Z",
    "links": [],
    "endTime": "2023-02-14T15:16:45.589067Z",
    "droppedAttributesCount": 0,
    "durationInNanos": 130000,
    "events": [],
    "@timestamp": "2023-02-14T15:16:50.019Z",
    "span.attributes.telemetry@sdk@version": "1.15.0",
    "span.attributes.telemetry@auto@version": "0.36b0",
    "resource.attributes.service@name": "guess-the-number",
    "status.code": 0,
    "instrumentationScope.version": "0.36b0",
    "instrumentationScope.name": "opentelemetry.instrumentation.jinja2",
    "span.attributes.telemetry@sdk@name": "opentelemetry",
    "span.attributes.jinja2@template_name": "index.html",
    "span.attributes.telemetry@sdk@language": "python",
    "span.attributes.service@name": "guess-the-number",
    "span.attributes.otel@scope@version": "0.36b0",
    "span.attributes.otel@scope@name": "opentelemetry.instrumentation.jinja2",
    "resource.attributes.tenant": "dbbatchscan"
}

The latest field resource.attributes.tenant": "dbbatchscan" is the one on which I would like to segregate traces on different indexes in Opensearch.

If I try this setup:

raw-pipeline:
  source:
    pipeline:
      name: "entry-pipeline"
  processor:
    - otel_trace_raw:
    - add_entries:
        entries:
        - key: "/resource.attributes.tenant"
          value: "test"
          overwrite_if_key_exists: true

  sink:
    - stdout:
    - opensearch:
        hosts: ["https://ENDPOINT.eu-west-1.es.amazonaws.com:443"]
        aws_sigv4: true
        aws_region: "eu-west-1"
        index: "prova_${/resource.attributes.tenant}"
        index_type: "custom"

This is working perfectly! (Also the indexing inside opensearch leveraging on the IAM roles linked to the data prepper ec2). This because I am forcing the creation (an overwrite in this case, since the field already exists) inside the processor.

But, as you can see from the input shown above, the field resource.attributes.tenant already exists, but without the add_entries processor, the same exact config is not working for me.

This seems like that the Opensearch sink is not able to access previously existing fields inside the event. But this is still wrong, because if I try to create the index using a topLevel field without any dots inside the name, the config now works (same input doc as above):

raw-pipeline:
  source:
    pipeline:
      name: "entry-pipeline"
  processor:
    - otel_trace_raw:
  sink:
    - stdout:
    - opensearch:
        hosts: ["https://ENDPOINT.eu-west-1.es.amazonaws.com:443"]
        aws_sigv4: true
        aws_region: "eu-west-1"
        index: "prova_${/serviceName}"
        index_type: "custom"

I have also tried some workaround, like copying the value from the resource.attributes.tenant field to an another field called "test", but again this is not working.

Many thanks for your help

Marco

kkondaka commented 1 year ago

@skylagallm , if you use index: "prova_${serviceName}" it should work. Please try it and let me know. If the serviceName is inside another field like "xyz" : { "serviceName" : "xxx"} then you should use index: "prova_${xyz/serviceName}".

fkirwin commented 1 year ago

I wanted to put a comment on here because I am seeing similar behavior, albeit not in the same exact context.

I am trying to set up some routes for metrics. Under certain conditions the data will go to different indices.

When using a root attribute such as serviceName, I am able to successfully filter and route records. Example

route:
    - "metrics": /serviceName  == "myapp"

But when I try to do this with any other attribute not in the root, the routing is not applied. All records meeting the criteria or not either go to the destination or are blocked. I had actually tried many more configs than just the ones below with no success. Example

route: 
    - "metrics": /resource/attributes/service/namespace  == 1 (doesn't work)
    - "metrics": /resource.attributes.service@namespace == 1 (doesn't work)

I also attempted this with other processors such as copy_values and was not able to get the behavior to work either. I think some of the confusion here is stemming from the data source. I see working examples using a log generator. In my case and it also appears in @skylagallm case we are getting data from OTEL. Something about the serialization handling here is preventing the plugins from reading or accessing these attributes.

kjorg50 commented 1 year ago

+1 to @fkirwin's issue regarding conditional routing. I'm also trying to setup routing logic based on span attributes from an OTEL data source. My routes are defined like so:

  route:
    - super-tenant: '/span.attributes.tenant == "super"'
    - other-tenant: '/span.attributes.tenant != "super"'

When I run this in my dev environment, I see data prepper logs like this:

        2023-03-29T21:33:32.931-07:00   2023-03-30T04:33:32,921 [raw-pipeline-processor-worker-5-thread-2] ERROR org.opensearch.dataprepper.pipeline.router.RouteEventEvaluator - Failed to evaluate route. This route will not be applied to any events.
    2023-03-29T21:33:32.931-07:00   org.opensearch.dataprepper.expression.ExpressionEvaluationException: Unable to evaluate statement "/span.attributes.tenant == "super""
    2023-03-29T21:33:32.931-07:00   at org.opensearch.dataprepper.expression.ConditionalExpressionEvaluator.evaluate(ConditionalExpressionEvaluator.java:48) ~[data-prepper-expression-2.1.1.jar:?]
    2023-03-29T21:33:32.931-07:00   at org.opensearch.dataprepper.expression.ConditionalExpressionEvaluator.evaluate(ConditionalExpressionEvaluator.java:18) ~[data-prepper-expression-2.1.1.jar:?]
    2023-03-29T21:33:32.931-07:00   at org.opensearch.dataprepper.pipeline.router.RouteEventEvaluator.findMatchedRoutes(RouteEventEvaluator.java:64) ~[data-prepper-core-2.1.1.jar:?]
    2023-03-29T21:33:32.931-07:00   at org.opensearch.dataprepper.pipeline.router.RouteEventEvaluator.evaluateEventRoutes(RouteEventEvaluator.java:45) ~[data-prepper-core-2.1.1.jar:?]
    2023-03-29T21:33:32.931-07:00   at org.opensearch.dataprepper.pipeline.router.Router.route(Router.java:39) ~[data-prepper-core-2.1.1.jar:?]
    2023-03-29T21:33:32.931-07:00   at org.opensearch.dataprepper.pipeline.Pipeline.publishToSinks(Pipeline.java:261) ~[data-prepper-core-2.1.1.jar:?]
    2023-03-29T21:33:32.931-07:00   at org.opensearch.dataprepper.pipeline.ProcessWorker.postToSink(ProcessWorker.java:117) ~[data-prepper-core-2.1.1.jar:?]
    2023-03-29T21:33:32.931-07:00   at org.opensearch.dataprepper.pipeline.ProcessWorker.doRun(ProcessWorker.java:98) ~[data-prepper-core-2.1.1.jar:?]
    2023-03-29T21:33:32.931-07:00   at org.opensearch.dataprepper.pipeline.ProcessWorker.run(ProcessWorker.java:45) ~[data-prepper-core-2.1.1.jar:?]
    2023-03-29T21:33:32.931-07:00   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
    2023-03-29T21:33:32.931-07:00   at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
    2023-03-29T21:33:32.931-07:00   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
    2023-03-29T21:33:32.931-07:00   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
    2023-03-29T21:33:32.931-07:00   at java.lang.Thread.run(Unknown Source) ~[?:?]
    2023-03-29T21:33:32.931-07:00   Caused by: org.opensearch.dataprepper.expression.ParseTreeCompositeException
    2023-03-29T21:33:32.931-07:00   at org.opensearch.dataprepper.expression.ParseTreeParser.createParseTree(ParseTreeParser.java:78) ~[data-prepper-expression-2.1.1.jar:?]
    2023-03-29T21:33:32.931-07:00   at org.opensearch.dataprepper.expression.ParseTreeParser.parse(ParseTreeParser.java:101) ~[data-prepper-expression-2.1.1.jar:?]
    2023-03-29T21:33:32.931-07:00   at org.opensearch.dataprepper.expression.ParseTreeParser.parse(ParseTreeParser.java:27) ~[data-prepper-expression-2.1.1.jar:?]
    2023-03-29T21:33:32.931-07:00   at org.opensearch.dataprepper.expression.MultiThreadParser.parse(MultiThreadParser.java:35) ~[data-prepper-expression-2.1.1.jar:?]
    2023-03-29T21:33:32.931-07:00   at org.opensearch.dataprepper.expression.MultiThreadParser.parse(MultiThreadParser.java:20) ~[data-prepper-expression-2.1.1.jar:?]
    2023-03-29T21:33:32.931-07:00   at org.opensearch.dataprepper.expression.ConditionalExpressionEvaluator.evaluate(ConditionalExpressionEvaluator.java:37) ~[data-prepper-expression-2.1.1.jar:?]
    2023-03-29T21:33:32.931-07:00   ... 13 more
    2023-03-29T21:33:32.931-07:00   Caused by: org.opensearch.dataprepper.expression.ExceptionOverview: Multiple exceptions (3)
    2023-03-29T21:33:32.931-07:00   |-- org.antlr.v4.runtime.LexerNoViableAltException: null
    2023-03-29T21:33:32.931-07:00   at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)
    2023-03-29T21:33:32.931-07:00   |-- java.lang.NullPointerException: Throwable was null!
    2023-03-29T21:33:32.931-07:00   at org.opensearch.dataprepper.expression.ParseTreeCompositeException.mapNullToNullPointer(ParseTreeCompositeException.java:35)
    2023-03-29T21:33:32.931-07:00   |-- org.antlr.v4.runtime.LexerNoViableAltException: null
    2023-03-29T21:33:32.931-07:00   at org.antlr.v4.runtime.atn.LexerATNSimulator.failOrAccept(LexerATNSimulator.java:309)

I'm not sure how the antlr runtime parses these expressions, but could it be that this type of dot notation is not supported? Should I create a separate bug for this? It would be useful if an additional example was added to the documentation for a scenario like this 🙂

kkondaka commented 1 year ago

@kjorg50 Data Prepper flattens the "attributes" and puts them in the parent object. So, if you are trying to match any fields inside the attributes, use it without the attributes. So try something like - "metrics": /resource/service/namespace == 1

nickrab commented 1 year ago

@kkondaka,

I also am having the same issue as @kjorg50 and others in this ticket. When parsing opentelemetry attributes for spans or resources they get transformed within the OtelProtoCodec into dot notation: https://github.com/opensearch-project/data-prepper/blob/e1ea5e126fcb392a5fd71be0e75a279acc162162/data-prepper-plugins/otel-proto-common/src/main/java/org/opensearch/dataprepper/plugins/otel/codec/OTelProtoCodec.java#L99

As the opening of this ticket describes, its straightforward to reproduce this by utilizing an OpenTelemetryCollector that generates attributes:

processors:
  resource:
    attributes:
    - key: tenant
      value: "test"
      action: insert

As far as I can tell that makes them impossible to address utilizing the conditional routing feature, as the antlr parser fails on the dot notation.

I additionally attempted to get around this by renaming the fields within the processor using the copy_value functionality. Addressing keys also does not work and the from_key will never be read.

    - copy_values:
        entries:
          - from_key: "/resource.attributes.tenant"
            to_key: "myHappyField"
            overwrite_if_to_key_exists: true

While I appreciate @graytaylor0 's efforts earlier to show how these operations function, they all work for me when I am using other pipelines, its the opentelemetry attributes that appear the problem.

I can open a separate ticket for these if it makes sense, but they all appear somewhat related.

How can dataprepper get a pointer to an opentelemetry attribute in an event under any component of a pipeline?

nickrab commented 1 year ago

Following up to help others experiencing this issue:

Attributes parsed out of OTEL input are transformed in the following way - They are appended with span.attribute or resource.attribute + . then the attribute name with dots transformed to @. Finally, those are put inside a top level json object called attributes

So, to access a resource attribute such as my.tenant the path would be /attributes/resource.attribute.my@tenant.

patrick-fa commented 1 year ago

@nickrab You are a life saver! I spent a week trying to figure out how to access/reference these attribute values in data prepper before stumbling across your reply here.

Adding a little more help for others who might come across this. Looking at the resulting documents saved in OpenSearch, it's not obvious the path to some of the fields you might want to access in a processor. @nickrab's reply inspired me to start digging through the data prepper code, and I found the data prepper span interface which will show you what fields are available on the underlying event object.

Digging a little further, we can look at OTelProtoCodec.parseSpan() which takes an OTEL span and converts it to the data prepper span. The key is where it builds the attributes, and you can see it is merging the span, resource, instrumentation scope, and status "attributes" into the single top-level object.

I went nuts trying to figure out how to remove the instrumentationScope.* and status.* fields (because I don't care about them in my OpenSearch index and want to minimize data storage). Not only can you reference values in the top-level attributes, you can also remove them with delete_entries. Here's what I finally got to work:

otel-traces-pipeline:
  processor:
    - otel_traces:
    - delete_entries:
        with_keys:
          - "attributes/instrumentationScope.version"
          - "attributes/instrumentationScope.name"
          - "attributes/status.code"
          - "attributes/status.message"

mvillafuertem commented 3 months ago

Hi @dlvenable any update around this?

@nickrab @patrick-fa I'm using Amazon Opensearch, can you confirm me how I have to compose the json path of the variable.

I have already tried these three options

indexName: otel-logs-${/resource/attributes/service.namespace}-${/resource/attributes/service.name}-%{yyyy.MM}

indexName: otel-logs-${/resource/attributes/service@namespace}-${/resource/attributes/service@name}-%{yyyy.MM}

otel log file looks something like this.

{
  "resourceLogs": [
    {
      "resource": {
        "attributes": [
          {
            "key": "service.namespace",
            "value": {
              "stringValue": "qa"
            }
          },
          {
            "key": "service.name",
            "value": {
              "stringValue": "app"
            }
          }
        ]
      },
      "scopeLogs": [
        {
          "scope": {},
          "logRecords": [
            {
              "observedTimeUnixNano": "1722943858480949454",
              "body": {
                "stringValue": "log"
              },
              "attributes": [
                {
                  "key": "log.file.path",
                  "value": {
                    "stringValue": "/var/log/messages"
                  }
                },
                {
                  "key": "log.file.name",
                  "value": {
                    "stringValue": "messages"
                  }
                },
                {
                  "key": "application",
                  "value": {
                    "stringValue": "app"
                  }
                },
                {
                  "key": "env",
                  "value": {
                    "stringValue": "qa"
                  }
                },
                {
                  "key": "region",
                  "value": {
                    "stringValue": "eu-west-2"
                  }
                },
                {
                  "key": "ui_version",
                  "value": {
                    "stringValue": "v1"
                  }
                },
                {
                  "key": "version",
                  "value": {
                    "stringValue": "v1.0"
                  }
                }
              ],
              "traceId": "",
              "spanId": ""
            }
          ]
        }
      ]
    }
  ]
}

mvillafuertem commented 3 months ago

I confirm that this works

Thanks @nickrab

indexName: otel-logs-${/attributes/resource.attributes.service@namespace}-${/attributes/resource.attributes.service@name}-%{yyyy.MM}

bitzhouwen commented 3 months ago

@patrick-fa thank you for your in-depth explaining from the source code, it help me dealing with the oltp log as well. In my situation, I'd like to using dynamic index with field "resource.attributes.k8s@namespace@name". I've tried to get this field like others did and failed. Just like you mentioned, the correct field name should be "attributes/resource.attributes.k8s@namespace@name" since it has been merged into the single top-level object. for helping others experiencing this issue, here is my log format:

{
  "traceId": "",
  "spanId": "",
  "severityText": "",
  "flags": 0,
  "time": "2024-08-27T10:32:10.381325333Z",
  "severityNumber": 0,
  "droppedAttributesCount": 0,
  "serviceName": null,
  "body": "E0827 10:32:10.381177       1 reflector.go:150] k8s.io/client-go@v0.30.0/tools/cache/reflector.go:232: Failed to watch *v1.VolumeSnapshotContent: failed to list *v1.VolumeSnapshotContent: the server could not find the requested resource (get volumesnapshotcontents.snapshot.storage.k8s.io)",
  "observedTime": "2024-08-27T10:32:10.427180527Z",
  "schemaUrl": "",
  "namespace": "kube-system",
  "log.attributes.time": "2024-08-27T10:32:10.381325333Z",
  "resource.attributes.k8s@namespace@name": "kube-system",
  "resource.attributes.app@kubernetes@io/version": "1.33.0",
  "resource.attributes.k8s@deployment@name": "ebs-csi-controller",
  "resource.attributes.k8s@container@name": "csi-snapshotter",
  "log.attributes.logtag": "F",
  "resource.attributes.app": "ebs-csi-controller",
  "resource.attributes.app@kubernetes@io/managed-by": "EKS",
  "resource.attributes.k8s@pod@start_time": "2024-08-26T08:23:15Z",
  "log.attributes.log@iostream": "stderr",
  "resource.attributes.k8s@pod@name": "ebs-csi-controller-647cfb485c-d8c2k",
  "log.attributes.log@file@path": "/var/log/pods/kube-system_ebs-csi-controller-647cfb485c-d8c2k_9c10f84f-e618-4d71-a57a-372786861e01/csi-snapshotter/0.log",
  "resource.attributes.k8s@pod@uid": "9c10f84f-e618-4d71-a57a-372786861e01",
  "resource.attributes.k8s@node@name": "ip-10-90-15-147.cn-northwest-1.compute.internal",
  "resource.attributes.k8s@container@restart_count": "0",
  "resource.attributes.app@kubernetes@io/component": "csi-driver",
  "resource.attributes.pod-template-hash": "647cfb485c",
  "resource.attributes.app@kubernetes@io/name": "aws-ebs-csi-driver"
}