opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.74k stars 1.81k forks source link

[BUG] Dynamic date detection causes parsing failure on certain strings #4055

Open pengxiaolong opened 2 years ago

pengxiaolong commented 2 years ago

Describe the bug When index a documents with filed in date time format that OpenSearch may not supported, it always throw exception, it was well supported in ElasticSearch-6.8. Document example:

{
  "DateTime":"2018-03-30T17-30-28.842Z"
}

To Reproduce Steps to reproduce the behavior:

  1. Go to 'Dev Tools' in Kiabana/OpenSearch Dashboards
  2. Run PUT test-index to create a Index
  3. Run PUT PUT test-index/_doc/1 to index the document mentioned in the description
  4. See error:
    "error" : {
    "root_cause" : [
      {
        "type" : "date_time_exception",
        "reason" : "date_time_exception: Value out of range: Hour[0-23], Minute[0-59], Second[0-59]"
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "failed to parse field [DateTime] of type [date] in document with id '1'. Preview of field's value: '2018-03-30T17-30-28.842Z'",
    "caused_by" : {
      "type" : "date_time_exception",
      "reason" : "date_time_exception: Value out of range: Hour[0-23], Minute[0-59], Second[0-59]"
    }
    },
    "status" : 400
    }

Expected behavior The document should be indexed w/o error.

Plugins Please list all plugins currently enabled.

Screenshots If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information): AWS OpenSearch-1.2/1.3 on AWS Graviton

Additional context Add any other context about the problem here.

dblock commented 2 years ago

Reading the documentation the timezone is the unsupported part. It would make a lot of sense to parse and store data with timezone, but I don't see how this worked in ES 6.8 - did it really out of the box? - ES seems to support this with custom mappings.

Maybe help dig up what we inherited at the time of the fork and maybe propose what we should do in OpenSearch for this?

pengxiaolong commented 2 years ago

timezone is not the cause if the issue, Z is a valid timezone called Zulu Time(Coordinated Universal Time), the issue is the the separator between hour/minute/second is - in the test, which causes date_time_exception

pengxiaolong commented 2 years ago

The bug is related to date_detection, the string "2018-03-30T17-30-28.842Z" should be detected as keyword/text, not date time, because the format doesn't match any date-time format supported by OpenSearch.

Here is similar test I did in ElasticSearch-6.8:

PUT /test
{
  "mappings": {
    "_doc": {
    "date_detection": "true"
   }
  }
}

PUT /test/_doc/1
{
  "field1":"2018-03-30T17-30-28.842Z"
}

GET /test/_mapping

Mapping of of the index after indexing the doc:

{
  "test" : {
    "mappings" : {
      "_doc" : {
        "date_detection" : true,
        "properties" : {
          "field1" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          }
        }
      }
    }
  }
}
pengxiaolong commented 2 years ago

If you run the same test in OpenSearch-1.2/1.3, it returns date_time_exception when index the doc.

dblock commented 2 years ago

@Pengxiaolong Thanks. We/someone/you needs to debug it. If you want to pick it up yourself, I would start by writing failing tests for this scenario.

Yury-Fridlyand commented 1 year ago

date_detection is set by default, but it supports only 3 default simple formats. https://github.com/opensearch-project/OpenSearch/blob/6a19660ba203cd5ea0f13fae9844044ffd58df90/server/src/main/java/org/opensearch/index/mapper/RootObjectMapper.java#L74-L78 @Pengxiaolong please try using yyyy/MM/dd HH:mm:ss format or specify your own in the mapping.

pengxiaolong commented 1 year ago

Thank you @Yury-Fridlyand ! I know how to bypass it by setting dynamic_date_formats to override the default settings for date_detection.

Specifying data format in mapping doesn't work, because date_detection run first, the exception from date_detection will still cause the error.

The bug was introduced by changes migrating Joda-Datetime to Java date time APIs, it is actually easy to fix, I had fixed in local repo, but I don't have permission to send pull request.