opensearch-project / data-prepper

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
254 stars 188 forks source link

[BUG] Unable to parse date time using defined patterns in Date Processor #4815

Open sb2k16 opened 1 month ago

sb2k16 commented 1 month ago

Date Processor in Data Prepper is unable to parse the date time given the below patterns defined in the pipeline YAML configuration.

Consider, testMessage.log has a line {"message": "Jul 30, 2024 3:28:55 PM"}, data prepper is unable to inject the @timestamp in the output event after parsing the date time from message key.

It looks like if we remove the line to default parsing for hour parseDefaulting(ChronoField.HOUR_OF_DAY, 0) from here, it is matching the expected behavior.

version: "2"
test-pipeline:
  source:
    file:
      path: "./testMessage.log"
      format: "json"
      record_type: "event"
  processor:
    - date:
        match:
          - key: "message"
            patterns: ["MMM dd, yyyy HH:mm:ss a", 
            "MMM dd, yyyy H:mm:ss a", "MMM dd, yyyy hh:mm:ss a", "MMM dd, yyyy h:mm:ss a", "MMM d, yyyy h:mm:ss a" ]
        destination: "@timestamp"
        destination_timezone: "UTC"
        to_origination_metadata: true
  sink:
    - stdout:
serbozanrevd commented 3 weeks ago

I am crossing the same bug. I am definitely not a Java fluent.

But based on my understanding of the javadoc, I am not in favour of removing the line to default parsing for hour parseDefaulting(ChronoField.HOUR_OF_DAY, 0) from here.

I think that the patch should more in the direction of:

checking if a is in the pattern like in this example

            if (pattern.contains("a")) {
              dateTimeFormatterBuilder.parseDefaulting(ChronoField.HOUR_OF_AMPM, 0);
            } else {
              dateTimeFormatterBuilder.parseDefaulting(ChronoField.HOUR_OF_DAY, 0);
            }

And for the pattern am/pm, the hour should not be the letter h, but the letter k (link).

serbozanrevd commented 3 weeks ago

In fact, there is a PR already opened https://github.com/opensearch-project/data-prepper/pull/4564.