opensearch-project / data-prepper

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
259 stars 190 forks source link

Receive OTLP and index into OpenSearch without `log.attributes` prefix #3098

Open alex-stiff opened 1 year ago

alex-stiff commented 1 year ago

Data Prepper pipelines.yaml:

otel-opensearch-pipeline:
  workers: 1
  delay: "5000"
  source:
    otel_logs_source:
      ssl: false
  sink:
  - opensearch:
      hosts: [ "https://es-host:9200" ]
      index: "test-index-%{yyyy.MM.dd}"
      username: admin
      password: <redacted>

This config receives OpenTelemetry logs and forwards them to OpenSearch. Attributes that are sent to OpenSearch are all prefixed with log.attributes resulting in OpenSearch docs that look like this:

{
  "_index": "test-index-2023.07.31",
  "_type": "_doc",
  "_id": "1237rIkBtBas0TDtiznV",
  "_version": 1,
  "_score": null,
  "_source": {
<rest of source redacted>
    "log.attributes.my_string": "TEST",
    "resource.attributes.telemetry@sdk@language": "dotnet",
    "log.attributes.dotnet@ilogger@category": "LoggingApp.Program",
    "log.attributes.my_int": 123
  }
}

If these prefixes are not used for anything useful in OpenSearch, is there a sensible way in Data Prepper to strip this log.attributes prefix off of the messages? The desired source would be like this:

"_source": {
  "my_string": "TEST",
  "resource.attributes.telemetry@sdk@language": "dotnet",
  "dotnet@ilogger@category": "LoggingApp.Program",
  "my_int": 123
}

And the log attributes are not known ahead of time. Thanks.

dlvenable commented 1 year ago

@alex-stiff,

Thanks for the question. Can you use the rename_keys processor for this?

processor:
   ...
    - rename_keys:
        entries:
        - from_key: "log.attributes.my_string"
          to_key: "my_string"
        - from_key: "log.attributes.my_int"
          to_key: "my_int"

You would need to specify each key. I'm unsure if an existing processor could do this for all log.attributes.* values. We could have this as a new feature.

Let me know if this works for you or if you need something else.

FV-ConeLabs commented 1 month ago

I have the same use case - stripping the "log.attributes." prefix from Otel logs. Wildcard / regex support for the rename_keys processor would be useful for us, as manually specifying all possible key names limits the flexibility (or at least, adds friction) of our structured logs.

The current docs don't specify the expected type of from_key and to_key options (they must be strings and not regex), which is inconsistent with the docs for some of the other processors which explicitly state the type.