opensearch-project / data-prepper

OpenSearch Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
263 stars 203 forks source link

[BUG] rename_keys processor: json pointers with escaped syntax fail to validate #5121

Open joelmarty opened 3 weeks ago

joelmarty commented 3 weeks ago

Describe the bug The escaped syntax for json pointers define how to build json pointers for fields that include special characters.

However, the isValidKey() method in JacksonEventKey only checks the basic character set and keys defined with the escaped syntax are rejected.

To Reproduce Steps to reproduce the behavior:

  1. Create a pipeline with a rename_keys processor using an escaped syntax:
    my-file-pipeline:
    source:
    file:
      path: run/data/events.jsonl
      record_type: event
      format: json
    sink:
    - file:
        path: "run/data/result.jsonl"
    processor:
    - rename_keys:
        entries:
          - from_key: host
            to_key: '"cs(host)"'
  2. Run data-prepper
  3. data-prepper cannot start with the error:

    2024-10-28T16:35:39,367 [main] ERROR org.opensearch.dataprepper.core.validation.LoggingPluginErrorsHandler - 1. rp-pipeline-file.processor.rename_keys: caused by: Parameter "entries.null.to_key" for plugin "renamekeys" is invalid: key "cs(host)" must contain only alphanumeric chars with .-@/ and must follow JsonPointer (ie. 'field/to/key')

Expected behavior The to_key argument "cs(host)" should be accepted as it conforms to the documented syntax.

Screenshots N/A

Environment (please complete the following information):

Additional context N/A

dlvenable commented 3 weeks ago

@joelmarty , Do you want to produce key names with parenthesis in them?

dlvenable commented 3 weeks ago

Perhaps we can make use of escape sequences to allow parenthesis. Right now, our validation just looks for the characters themselves. But, we do not allow them to be escaped.

We have some related work in #5111.

joelmarty commented 3 weeks ago

@dlvenable yes, I am trying to produce field names compatible with w3c's extended log file format, that uses the format prefix(header) to designate headers sent in the request or the response. For instance, cs(user-agent) is the field for the user-agent header sent in the request.