opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
8.89k stars 1.63k forks source link

Added New Fast DateTime parser implementation for RFC3339 Datetime format #11465

Closed CaptainDredge closed 4 months ago

CaptainDredge commented 6 months ago

Description

This PR adds new rfc3339 format for datetime parsing whose implementation is not based on the usual java datetime formatters but is hand written to make it extremely fast. RFC3339 format is one of the most used format for datetime across internet. The parser implements a lenient version of rfc3339 which aligns more with W3C Note. This implementation is 42x faster than strict_date_optional_time_format to parse rfc3339 compatible datetime strings.

Microbenchmark results

DocumentParsingBenchmark.benchRFC3339Parser               2023-01-01T23:38:34.000Z  avgt   30    66.321 ±  0.424  ns/op
DocumentParsingBenchmark.benchRFC3339Parser               1970-01-01T00:16:12.675Z  avgt   30    68.199 ±  1.341  ns/op
DocumentParsingBenchmark.benchRFC3339Parser               5050-01-01T12:02:01.123Z  avgt   30    66.213 ±  0.311  ns/op
DocumentParsingBenchmark.strictDateOptionalTimeFormatter  2023-01-01T23:38:34.000Z  avgt   30  2732.462 ± 10.043  ns/op
DocumentParsingBenchmark.strictDateOptionalTimeFormatter  1970-01-01T00:16:12.675Z  avgt   30  2794.871 ±  8.151  ns/op
DocumentParsingBenchmark.strictDateOptionalTimeFormatter  5050-01-01T12:02:01.123Z  avgt   30  2779.538 ± 13.909  ns/op

Related Issues

Resolves #8361

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions[bot] commented 6 months ago

Compatibility status:

Checks if related components are compatible with change c325fb2

Incompatible components

Incompatible components: [https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/k-nn.git]

github-actions[bot] commented 6 months ago

:x: Gradle check result for 4c85946eae85d8678e0f183ba7a170ffaf713b76: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 6 months ago

:x: Gradle check result for 6c6b49755e8453a5f007b5018d8f1722750e0e6f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

CaptainDredge commented 6 months ago

Working on CI failures after which it should be ready for review

github-actions[bot] commented 5 months ago

:x: Gradle check result for d8631d67825f08f0c347a18705becf2dfbab1002: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 5 months ago

:x: Gradle check result for ce6ebaa6d0bb88de87cfa315ef678870c368b934: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 5 months ago

:x: Gradle check result for 2e155146655d79739e80e5474c6168a6ecc8f25a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 5 months ago

:x: Gradle check result for 5ce7e95fbb3ad7d06b9bb83757a65d55272a5397: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 5 months ago

:x: Gradle check result for b7262e94984b61643adb716cbe39fa627c995d07: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 5 months ago

:x: Gradle check result for 6801409b7c567daa6308b7a7d1c00ebde2f27676: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 5 months ago

:x: Gradle check result for ca4068b261596db48d100ac9bfc3ef34d161bda1: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 5 months ago

:x: Gradle check result for 75a4c45dde1d5cf10d2e2ec8ffca107e87d8e542: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 5 months ago

:grey_exclamation: Gradle check result for c01debdb1736da41e66cabeaffe49525b9373806: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

CaptainDredge commented 5 months ago

TEST FAILURES: 1 org.opensearch.remotestore.RemoteIndexPrimaryRelocationIT.testPrimaryRelocationWhileIndexing Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Flaky test #9191

codecov[bot] commented 5 months ago

Codecov Report

Attention: 61 lines in your changes are missing coverage. Please review.

Comparison is base (30aa8be) 71.33% compared to head (c325fb2) 71.39%.

Files Patch % Lines
...ommon/time/RFC3339CompatibleDateTimeFormatter.java 72.62% 28 Missing and 21 partials :warning:
...earch/common/time/OpenSearchDateTimeFormatter.java 52.38% 10 Missing :warning:
.../org/opensearch/common/time/JavaDateFormatter.java 92.30% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #11465 +/- ## ============================================ + Coverage 71.33% 71.39% +0.05% - Complexity 59438 59475 +37 ============================================ Files 4923 4925 +2 Lines 279218 279453 +235 Branches 40596 40630 +34 ============================================ + Hits 199186 199508 +322 + Misses 63506 63316 -190 - Partials 16526 16629 +103 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

github-actions[bot] commented 5 months ago

:grey_exclamation: Gradle check result for a970d172016c0d5063b62c8e9c4fbb0fbc0dcecf: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

CaptainDredge commented 5 months ago

TEST FAILURES: 1 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Flaky test #10755

CaptainDredge commented 5 months ago

@shwetathareja / @reta Can you please review the PR?

reta commented 5 months ago

@shwetathareja / @reta Can you please review the PR?

Apologies @CaptainDredge , I may get to it later this week, sorry about that

CaptainDredge commented 5 months ago

@reta gentle ping 😅

reta commented 5 months ago

Found some more time to look into it (sorry for delay @CaptainDredge ), I think we could solely rely on DateFormatter and get rid of CustomDateTimeFormatter altogether:

github-actions[bot] commented 5 months ago

:x: Gradle check result for bf58f1d33242006bd5764e15c8967c51f2c964ac: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

CaptainDredge commented 5 months ago

I think we could solely rely on DateFormatter and get rid of CustomDateTimeFormatter altogether

Thanks @reta for the suggestion but I think there's a fundamental issue why it my not work. If you look at DateFormatter interface forPattern function which is the default implementation of the function used throughout codebase, its dependent on JavaDateFormatter which is its concrete implementation. Thus the concrete class abstractions has been already leaked to the interface. Hence, at runtime we will not be able to selectively decide which one to choose out of JavaDateFormatter and RfcDateFormatter that you suggested. Let me know if there are some gaps in my understanding

github-actions[bot] commented 5 months ago

:grey_exclamation: Gradle check result for 228eb865741fff8b5911c4f86af045722864d00a: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

reta commented 5 months ago

Thus the concrete class abstractions has been already leaked to the interface. Hence, at runtime we will not be able to selectively decide which one to choose out of JavaDateFormatter and RfcDateFormatter that you suggested.

I am not sure I understand this issue: indeed DateFormatter::forPattern uses JavaDateFormatter.combined static method, that is implementation type dependency and does not leak JavaDateFormatter outside. JavaDateFormatter.combined in turn depends only on DateFormatters so we should no see any issues here: "strict_date_optional_time_nanos || rfc3339" should work just fine.

~Or I am missing something here?~

Oh I see now, the API of JavaDateFormatter.combined is lying to us, it asks DateFormatter but needs JavaDateFormatter:

assert formatter instanceof JavaDateFormatter;

I think we may need one more abstraction (which could be package private) to open up parsers and printers there, let me work it through.

UPD: I believe we could lift JavaDateFormatter restriction by introducing InternalDateFormatter, that both JavaXxx and RfcXxx would implement.

abstract class InternalDateFormatter implements DateFormatter {

    abstract DateTimeFormatter getPrinter();

    abstract Collection<? extends DateTimeFormatter> getParsers();

    abstract JavaDateFormatter getRoundupParser();

    abstract String format();
}
CaptainDredge commented 5 months ago

@reta I tried implementing the above suggested approach but I still didn't find a way to get rid of OpenSearchDateTimeFormatter by introducing the abstraction of InternalDateFormatter because DateTimeFormatter is a final class and hence cannot be extended which means getParsers can only return Collection<DateTimeFormatter>. Any new formatter like RFC3339DateTimeFormatter which doesn't depend on DateTimeFormatter for parsing won't be able to return such a collection, instead it has to return itself(this object) which has to comply with DateFormatter interface.

What are some of the concerns you see in current design of having a wrapper around DateTimeFormatter ? I feel like it gives us the opportunity to override behaviour of parsers at lowest level possible i.e at the level of DateTimeFormatter instead of JavaDateFormatter level

github-actions[bot] commented 5 months ago

:white_check_mark: Gradle check result for ad3566525959e46865bd5fd17dd1bc380fabb971: SUCCESS

reta commented 5 months ago

extended which means getParsers can only return Collection<DateTimeFormatter>.

Thanks @CaptainDredge , right, this is indeed an issue with the suggested solution

What are some of the concerns you see in current design of having a wrapper around DateTimeFormatter ?

Yes, I would prefer to not have any wrappers at all: few optimizations went into date / time parsing but additional wrappers clearly would not help here. Plus, we need the wrapper for one particular case (new RFC compatible parser) but sadly now every other parser would pay this price (yes, subclassing DateTimeFormatter would be easiest way forward but it is not possible).

CaptainDredge commented 5 months ago

Thanks @reta but if not the wrapper then I don't see an alternate solution here

we need the wrapper for one particular case

Also, I think its not a particular case, we'll be adding new fast implementation of parsers like for LocalDateTime etc.

reta commented 5 months ago

Thanks @reta but if not the wrapper then I don't see an alternate solution here

There are alternative solutions, just one of them: keep JavaDateFormatter as-is (so it only deals with DateTimeFormatter) and introduce OpenSearchJavaDateFormatter (that deals with newly introduced OpenSearchDateFormatter). In this case, the only place we need to wrap is in the JavaDateFormatter::combine implementation, when there is a mix of OpenSearchJavaDateFormatters and JavaDateFormatters (otherwise the JavaDateFormatters could be combined without the need to be wrapped). I will share some more thoughts on that next week (sorry on vacation these days like most of us).

Also, I think its not a particular case, we'll be adding new fast implementation of parsers like for LocalDateTime etc.

That still does not change the statement in principle - the wrapping would be needed in a few cases.

reta commented 4 months ago

@CaptainDredge just quick update on OpenSearchDateTimeFormatter wrappper (apologies for some delay), I have been running some tests and it shows negligible performance impact on date parsing, so I would like to withdraw my concern (for information only, I was able to eliminate wrapping for existing flows and only introduce it when needed but the implementation was not pretty and something I would like not to see or suggest), let's work on other review comments so we could finish this one. Thanks!

ethlo commented 4 months ago

Hi! I understood that you needed a fast parser and have already found the ethlo ITU parser. I took the opportunity to make it even faster as I have gotten some feedback lately that the project is extremely useful for weak CPUs (even on android devices). 1.7.5 should be nearly twice as fast as 1.7.3. See https://github.com/ethlo/itu?tab=readme-ov-file#performance for the latest performance numbers.

Would it not be feasible to keep the parser you have and put ITU in front of it? If it parses correctly, all is well and you gain a lot of performance, otherwise fall back to your legacy parser and maintain perfect backwards compatibility.

CaptainDredge commented 4 months ago

Would it not be feasible to keep the parser you have and put ITU in front of it? If it parses correctly, all is well and you gain a lot of performance, otherwise fall back to your legacy parser and maintain perfect backwards compatibility.

Its not as simple as that because there are differences in the way each parser will parse, for eg. the local datetime parser won't handle time zone etc. ITU has a specific implementation which may/maynot conform with the date format we're trying to provide to users. Also, an external dependency would be another overhead to maintain

CaptainDredge commented 4 months ago

let's work on other review comments so we could finish this one. Thanks!

@reta Thanks I've addressed other review comments, let me know what else needs to be addressed to take it to closure :)

github-actions[bot] commented 4 months ago

:x: Gradle check result for 112a2b22344a60bc2485459b7a4b00e868680cf7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions[bot] commented 4 months ago

:grey_exclamation: Gradle check result for 5727f76bc5fbccfd475c02afa32d499678220c88: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

CaptainDredge commented 4 months ago

TEST FAILURES: 1 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testReadRangeBlobWithRetries Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Flaky test #6090

github-actions[bot] commented 4 months ago

:x: Gradle check result for e0eed1a956c3e98c2b6257ff92c6c6cabb279985: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

reta commented 4 months ago

Thanks @CaptainDredge , I have one comment related to changes in exception handling but LGTM otherwise!

github-actions[bot] commented 4 months ago

:x: Gradle check result for c1bc4463a7c3ad46f3f416eec48fa4a27256645c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

CaptainDredge commented 4 months ago

Thanks @CaptainDredge , I have one comment related to changes in exception handling but LGTM otherwise!

@reta Thank you so much for your continuous feedback and helping in taking this to completion :)

I'm trying to get CI green but I think there's some problem with spotless plugin currently due to which precommits are failing

A problem occurred configuring project ':build-tools'.
> Could not create task ':build-tools:spotlessJava'.
   > java.io.IOException: Failed to load eclipse jdt formatter: java.lang.RuntimeException: java.net.SocketTimeoutException: timeout
reta commented 4 months ago

I'm trying to get CI green but I think there's some problem with spotless plugin currently due to which precommits are failing

Yeah, it seems like some network hiccups :(

github-actions[bot] commented 4 months ago

:x: Gradle check result for 75150f12f1086886e802dd5b3f7b350365c3c636: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

reta commented 4 months ago

❌ Gradle check result for 75150f1: FAILURE

Should be fixed by https://github.com/opensearch-project/OpenSearch/pull/11837

CaptainDredge commented 4 months ago

Rebased on main after #11837 was merged. Still there are some socket timeout issues in precommits

github-actions[bot] commented 4 months ago

:x: Gradle check result for 3d89308835256779d287314061e9b76c685fe557: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

CaptainDredge commented 4 months ago

9191 Flaky test failing

reta commented 4 months ago

@CaptainDredge please create a documentation issue here [1] since this is change in public API (new data format people can use), thank you.

[1] https://github.com/opensearch-project/documentation-website

reta commented 4 months ago

@mgodwan mind please checking this one out? need your signoff as well, thank you

github-actions[bot] commented 4 months ago

:white_check_mark: Gradle check result for 3d89308835256779d287314061e9b76c685fe557: SUCCESS