opensearch-project / anomaly-detection

Identify atypical data and receive automatic notifications
https://opensearch.org/docs/latest/monitoring-plugins/ad/index/
Apache License 2.0
67 stars 73 forks source link

Fix race condition in PageListener #1351

Closed kaituo closed 1 month ago

kaituo commented 1 month ago

Description

This PR

These changes address the race condition where sentOutPages might not have been incremented in time before checking whether to schedule the imputeHC task. By accurately tracking the number of in-flight pages and sent pages, we ensure that imputeHC is executed only after all pages have been fully processed and all responses have been received.

Testing done:

  1. Reproduced the race condition by starting two detectors with imputation. This causes an out of order illegal argument exception from RCF due to this race condition. Also verified the change fixed the problem.
  2. added an IT for the above scenario.

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

codecov[bot] commented 1 month ago

Codecov Report

Attention: Patch coverage is 85.71429% with 1 line in your changes missing coverage. Please review.

Project coverage is 80.00%. Comparing base (da73506) to head (2b497b9). Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...ensearch/timeseries/transport/ResultProcessor.java 85.71% 0 Missing and 1 partial :warning:
Additional details and impacted files [![Impacted file tree graph](https://app.codecov.io/gh/opensearch-project/anomaly-detection/pull/1351/graphs/tree.svg?width=650&height=150&src=pr&token=ZPONorT0bX&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=opensearch-project)](https://app.codecov.io/gh/opensearch-project/anomaly-detection/pull/1351?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=opensearch-project) ```diff @@ Coverage Diff @@ ## main #1351 +/- ## ========================================= Coverage 80.00% 80.00% - Complexity 5662 5673 +11 ========================================= Files 533 533 Lines 23429 23430 +1 Branches 2335 2334 -1 ========================================= + Hits 18745 18746 +1 - Misses 3573 3578 +5 + Partials 1111 1106 -5 ``` | [Flag](https://app.codecov.io/gh/opensearch-project/anomaly-detection/pull/1351/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=opensearch-project) | Coverage Δ | | |---|---|---| | [plugin](https://app.codecov.io/gh/opensearch-project/anomaly-detection/pull/1351/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=opensearch-project) | `80.00% <85.71%> (+<0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=opensearch-project#carryforward-flags-in-the-pull-request-comment) to find out more. | [Files with missing lines](https://app.codecov.io/gh/opensearch-project/anomaly-detection/pull/1351?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=opensearch-project) | Coverage Δ | | |---|---|---| | [...imeseries/transport/ResultBulkTransportAction.java](https://app.codecov.io/gh/opensearch-project/anomaly-detection/pull/1351?src=pr&el=tree&filepath=src%2Fmain%2Fjava%2Forg%2Fopensearch%2Ftimeseries%2Ftransport%2FResultBulkTransportAction.java&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=opensearch-project#diff-c3JjL21haW4vamF2YS9vcmcvb3BlbnNlYXJjaC90aW1lc2VyaWVzL3RyYW5zcG9ydC9SZXN1bHRCdWxrVHJhbnNwb3J0QWN0aW9uLmphdmE=) | `70.58% <ø> (ø)` | | | [...ensearch/timeseries/transport/ResultProcessor.java](https://app.codecov.io/gh/opensearch-project/anomaly-detection/pull/1351?src=pr&el=tree&filepath=src%2Fmain%2Fjava%2Forg%2Fopensearch%2Ftimeseries%2Ftransport%2FResultProcessor.java&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=opensearch-project#diff-c3JjL21haW4vamF2YS9vcmcvb3BlbnNlYXJjaC90aW1lc2VyaWVzL3RyYW5zcG9ydC9SZXN1bHRQcm9jZXNzb3IuamF2YQ==) | `78.90% <85.71%> (+0.88%)` | :arrow_up: | ... and [14 files with indirect coverage changes](https://app.codecov.io/gh/opensearch-project/anomaly-detection/pull/1351/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=opensearch-project)
amitgalitz commented 1 month ago

Overall were we processing results of detector 1 still but then detector 2 already finished and went to scheduleImputeHCTask?

kaituo commented 1 month ago

Overall were we processing results of detector 1 still but then detector 2 already finished and went to scheduleImputeHCTask?

scheduleImputeHCTask is detector specific. So when detector 2 is finished, scheduleImputeHCTask will start regardless of detector 1.