opensearch-project / alerting

📟 Get notified when your data meets certain conditions by setting up monitors, alerts, and notifications
https://opensearch.org/docs/latest/monitoring-plugins/alerting/index/
Apache License 2.0
60 stars 102 forks source link

Fix flaky integ tests #298

Closed ohltyler closed 2 years ago

ohltyler commented 2 years ago

When running the distribution integ tests, commonly seeing these failures from the alerting plugin:

Security disabled (See Playground/ohltyler-integ-test-stable run 60, arm64, version 1.2.2):

Suite: Test class org.opensearch.alerting.MonitorRunnerIT
  2> феб 01, 2022 12:44:17 AM org.opensearch.client.RestClient logResponse
  2> WARNING: request [PUT http://localhost:9200/.opendistro-anomaly-results-history-2020.10.17] returned 1 warnings: [299 OpenSearch-1.2.2-123d41ce4fad54529acd7a290efed848e707b624 "index name [.opendistro-anomaly-results-history-2020.10.17] starts with a dot '.', in the next major version, index names starting with a dot are reserved for hidden indices and system indices"]
  2> феб 01, 2022 12:44:18 AM org.opensearch.client.RestClient logResponse
  2> WARNING: request [PUT http://localhost:9200/.opendistro-anomaly-results-history-2020.10.17] returned 1 warnings: [299 OpenSearch-1.2.2-123d41ce4fad54529acd7a290efed848e707b624 "index name [.opendistro-anomaly-results-history-2020.10.17] starts with a dot '.', in the next major version, index names starting with a dot are reserved for hidden indices and system indices"]
  2> феб 01, 2022 12:44:22 AM org.opensearch.client.RestClient logResponse
  2> WARNING: request [PUT http://localhost:9200/.opendistro-anomaly-results-history-2020.10.17] returned 1 warnings: [299 OpenSearch-1.2.2-123d41ce4fad54529acd7a290efed848e707b624 "index name [.opendistro-anomaly-results-history-2020.10.17] starts with a dot '.', in the next major version, index names starting with a dot are reserved for hidden indices and system indices"]
  2> феб 01, 2022 12:44:24 AM org.opensearch.client.RestClient logResponse
  2> WARNING: request [PUT http://localhost:9200/.ewlydxmrjg] returned 1 warnings: [299 OpenSearch-1.2.2-123d41ce4fad54529acd7a290efed848e707b624 "index name [.ewlydxmrjg] starts with a dot '.', in the next major version, index names starting with a dot are reserved for hidden indices and system indices"]
  2> феб 01, 2022 12:44:25 AM org.opensearch.client.RestClient logResponse
  2> WARNING: request [PUT http://localhost:9200/.opendistro-anomaly-results-history-2020.10.17] returned 1 warnings: [299 OpenSearch-1.2.2-123d41ce4fad54529acd7a290efed848e707b624 "index name [.opendistro-anomaly-results-history-2020.10.17] starts with a dot '.', in the next major version, index names starting with a dot are reserved for hidden indices and system indices"]
  2> REPRODUCE WITH: ./gradlew ':alerting:integTest' --tests "org.opensearch.alerting.MonitorRunnerIT.test bucket-level monitor with acknowledged alert" -Dtests.seed=ACD4C7F5C8A5133D -Dtests.security.manager=false -Dtests.locale=sr -Dtests.timezone=PLT -Druntime.java=14
  2> java.lang.AssertionError
        at __randomizedtesting.SeedInfo.seed([ACD4C7F5C8A5133D:A71CEF7486704F5B]:0)
        at org.junit.Assert.fail(Assert.java:86)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at org.junit.Assert.assertNotNull(Assert.java:712)
        at org.junit.Assert.assertNotNull(Assert.java:722)
        at org.opensearch.alerting.MonitorRunnerIT.verifyAlert(MonitorRunnerIT.kt:1600)
        at org.opensearch.alerting.MonitorRunnerIT.test bucket-level monitor with acknowledged alert(MonitorRunnerIT.kt:1128)
  2> NOTE: leaving temporary files on disk at: /tmp/tmpzgxjyrsk/alerting/alerting/build/testrun/integTest/temp/org.opensearch.alerting.MonitorRunnerIT_ACD4C7F5C8A5133D-001
  2> NOTE: test params are: codec=Asserting(Lucene87): {}, docValues:{}, maxPointsInLeafNode=991, maxMBSortInHeap=5.56516294962998, sim=Asserting(RandomSimilarity(queryNorm=false): {}), locale=sr, timezone=PLT
  2> NOTE: Linux 4.14.243-185.433.amzn2.aarch64 aarch64/AdoptOpenJDK 14.0.2 (64-bit)/cpus=16,threads=1,free=349050224,total=536870912
  2> NOTE: All tests run in this JVM: [MonitorRunnerIT]

174 tests completed, 1 failed

=== Standard output of node `node{:alerting:integTest-0}` ===

FAILURE: Build failed with an exception.
ohltyler commented 2 years ago

Another 2 failures seen in 1.3.0 (Jenkins job ohltyler-integ-test-stable #73):

REPRODUCE WITH: ./gradlew ':alerting:integTest' --tests "org.opensearch.alerting.resthandler.SecureMonitorRestApiIT.test execute query-level monitor with user having partial index permissions" -Dtests.seed=4F146AFCD0CB9E11 -Dtests.security.manager=false -Dtests.locale=de -Dtests.timezone=Africa/Bamako -Druntime.java=14
  2> org.opensearch.client.ResponseException: method [PUT], host [https://localhost:9200], URI [/_plugins/_security/api/roles/hr_role], status line [HTTP/1.1 400 Bad Request]
    {"status":"error","reason":"Could not parse content of request."}
        at __randomizedtesting.SeedInfo.seed([4F146AFCD0CB9E11:45AE307301D985AE]:0)
        at org.opensearch.client.RestClient.convertResponse(RestClient.java:344)
        at org.opensearch.client.RestClient.performRequest(RestClient.java:314)
        at org.opensearch.client.RestClient.performRequest(RestClient.java:289)
        at org.opensearch.alerting.AlertingRestTestCase.createIndexRoleWithDocLevelSecurity(AlertingRestTestCase.kt:848)
        at org.opensearch.alerting.AlertingRestTestCase.createUserWithDocLevelSecurityTestData(AlertingRestTestCase.kt:897)
        at org.opensearch.alerting.resthandler.SecureMonitorRestApiIT.test execute query-level monitor with user having partial index permissions(SecureMonitorRestApiIT.kt:524)
REPRODUCE WITH: ./gradlew ':alerting:integTest' --tests "org.opensearch.alerting.resthandler.SecureMonitorRestApiIT.test execute bucket-level monitor with user having partial index permissions" -Dtests.seed=4F146AFCD0CB9E11 -Dtests.security.manager=false -Dtests.locale=de -Dtests.timezone=Africa/Bamako -Druntime.java=14
  2> org.opensearch.client.ResponseException: method [PUT], host [https://localhost:9200], URI [/_plugins/_security/api/roles/hr_role], status line [HTTP/1.1 400 Bad Request]
    {"status":"error","reason":"Could not parse content of request."}
        at __randomizedtesting.SeedInfo.seed([4F146AFCD0CB9E11:6FCCA52B65917BDD]:0)
        at org.opensearch.client.RestClient.convertResponse(RestClient.java:344)
        at org.opensearch.client.RestClient.performRequest(RestClient.java:314)
        at org.opensearch.client.RestClient.performRequest(RestClient.java:289)
        at org.opensearch.alerting.AlertingRestTestCase.createIndexRoleWithDocLevelSecurity(AlertingRestTestCase.kt:848)
        at org.opensearch.alerting.AlertingRestTestCase.createUserWithDocLevelSecurityTestData(AlertingRestTestCase.kt:897)
        at org.opensearch.alerting.resthandler.SecureMonitorRestApiIT.test execute bucket-level monitor with user having partial index permissions(SecureMonitorRestApiIT.kt:575)
qreshi commented 2 years ago

I believe the test bucket-level monitor with acknowledged alert Alerting test failure is a rare occurrence of a random Monitor configuration which fails the expected assertions. We can put out a fix for that.

For the test execute query-level monitor with user having partial index permissions and test execute bucket-level monitor with user having partial index permissions Alerting tests, there is a fix currently in PR for it.

As for the other security related Alerting tests, they've been passing in our 1.2 and main branches so I'm not sure why they're failing here.

ohltyler commented 2 years ago

@qreshi thanks for the update. I've updated the issue to only track the first one you mentioned that isn't covered by a current PR.

For the security-related ones, I think it's fine to ignore as it's likely due to some under-resourced jenkins hosts.

ohltyler commented 2 years ago

@qreshi @lezzago Seeing a few new security-related issues. I'm assuming more bake time is needed to get permissions set before executing the tests? Let me know what you think.

REPRODUCE WITH: ./gradlew ':alerting:integTest' --tests "org.opensearch.alerting.resthandler.SecureMonitorRestApiIT.test query all alerts in all states with disabled filter by" -Dtests.seed=508AAB0DA496B3D0 -Dtests.security.manager=false -Dtests.locale=ar-QA -Dtests.timezone=Asia/Kathmandu -Druntime.java=14
  2> org.opensearch.client.ResponseException: method [GET], host [https://localhost:9200/], URI [/_plugins/_alerting/monitors/alerts?missing=_last&], status line [HTTP/1.1 403 Forbidden]
    {"error":{"root_cause":[{"type":"security_exception","reason":"no permissions for [cluster:admin/opendistro/alerting/alerts/get] and User [name=userOne, backend_roles=[], requestedTenant=null]"}],"type":"security_exception","reason":"no permissions for [cluster:admin/opendistro/alerting/alerts/get] and User [name=userOne, backend_roles=[], requestedTenant=null]"},"status":403}
        at __randomizedtesting.SeedInfo.seed([508AAB0DA496B3D0:9522FCD60B218367]:0)
        at org.opensearch.client.RestClient.convertResponse(RestClient.java:344)
        at org.opensearch.client.RestClient.performRequest(RestClient.java:314)
        at org.opensearch.client.RestClient.performRequest(RestClient.java:289)
        at org.opensearch.alerting.TestHelpersKt.makeRequest(TestHelpers.kt:478)
        at org.opensearch.alerting.AlertingRestTestCase.getAlerts(AlertingRestTestCase.kt:512)
        at org.opensearch.alerting.AlertingRestTestCase.getAlerts$default(AlertingRestTestCase.kt:505)
        at org.opensearch.alerting.resthandler.SecureMonitorRestApiIT.test query all alerts in all states with disabled filter by(SecureMonitorRestApiIT.kt:632)
  2> فبر 11, 2022 4:58:15 ص org.opensearch.client.RestClient logResponse
REPRODUCE WITH: ./gradlew ':alerting:integTest' --tests "org.opensearch.alerting.resthandler.SecureMonitorRestApiIT.test query all alerts in all states with filter by" -Dtests.seed=99E6A35FD2F66DD0 -Dtests.security.manager=false -Dtests.locale=hu -Dtests.timezone=US/Eastern -Druntime.java=14
  2> org.opensearch.client.ResponseException: method [GET], host [https://localhost:9200/], URI [/_plugins/_alerting/monitors/alerts?missing=_last&], status line [HTTP/1.1 403 Forbidden]
    {"error":{"root_cause":[{"type":"security_exception","reason":"no permissions for [cluster:admin/opendistro/alerting/alerts/get] and User [name=userOne, backend_roles=[], requestedTenant=null]"}],"type":"security_exception","reason":"no permissions for [cluster:admin/opendistro/alerting/alerts/get] and User [name=userOne, backend_roles=[], requestedTenant=null]"},"status":403}
        at __randomizedtesting.SeedInfo.seed([99E6A35FD2F66DD0:ECB20D4AC1A8839E]:0)
        at org.opensearch.client.RestClient.convertResponse(RestClient.java:344)
        at org.opensearch.client.RestClient.performRequest(RestClient.java:314)
        at org.opensearch.client.RestClient.performRequest(RestClient.java:289)
        at org.opensearch.alerting.TestHelpersKt.makeRequest(TestHelpers.kt:478)
        at org.opensearch.alerting.AlertingRestTestCase.getAlerts(AlertingRestTestCase.kt:512)
        at org.opensearch.alerting.AlertingRestTestCase.getAlerts$default(AlertingRestTestCase.kt:505)
        at org.opensearch.alerting.resthandler.SecureMonitorRestApiIT.test query all alerts in all states with filter by(SecureMonitorRestApiIT.kt:669)
  2> febr. 10, 2022 6:19:36 DU org.opensearch.client.RestClient logResponse
REPRODUCE WITH: ./gradlew ':alerting:integTest' --tests "org.opensearch.alerting.resthandler.SecureMonitorRestApiIT.test query monitors with disable filter by" -Dtests.seed=99E6A35FD2F66DD0 -Dtests.security.manager=false -Dtests.locale=hu -Dtests.timezone=US/Eastern -Druntime.java=14
  2> org.opensearch.client.ResponseException: method [POST], host [https://localhost:9200/], URI [/_plugins/_alerting/monitors/_search], status line [HTTP/1.1 403 Forbidden]
    {"error":{"root_cause":[{"type":"security_exception","reason":"no permissions for [cluster:admin/opendistro/alerting/monitor/search] and User [name=userOne, backend_roles=[], requestedTenant=null]"}],"type":"security_exception","reason":"no permissions for [cluster:admin/opendistro/alerting/monitor/search] and User [name=userOne, backend_roles=[], requestedTenant=null]"},"status":403}
        at __randomizedtesting.SeedInfo.seed([99E6A35FD2F66DD0:F20F0B23573970AA]:0)
        at org.opensearch.client.RestClient.convertResponse(RestClient.java:344)
        at org.opensearch.client.RestClient.performRequest(RestClient.java:314)
        at org.opensearch.client.RestClient.performRequest(RestClient.java:289)
        at org.opensearch.alerting.TestHelpersKt.makeRequest(TestHelpers.kt:454)
        at org.opensearch.alerting.resthandler.SecureMonitorRestApiIT.test query monitors with disable filter by(SecureMonitorRestApiIT.kt:436)
  2> febr. 10, 2022 6:19:41 DU org.opensearch.client.RestClient logResponse
2> REPRODUCE WITH: ./gradlew ':alerting:integTest' --tests "org.opensearch.alerting.resthandler.SecureMonitorRestApiIT.test query all alerts in all states with disabled filter by" -Dtests.seed=99E6A35FD2F66DD0 -Dtests.security.manager=false -Dtests.locale=hu -Dtests.timezone=US/Eastern -Druntime.java=14
  2> org.opensearch.client.ResponseException: method [GET], host [https://localhost:9200/], URI [/_plugins/_alerting/monitors/alerts?missing=_last&], status line [HTTP/1.1 403 Forbidden]
    {"error":{"root_cause":[{"type":"security_exception","reason":"no permissions for [cluster:admin/opendistro/alerting/alerts/get] and User [name=userOne, backend_roles=[], requestedTenant=null]"}],"type":"security_exception","reason":"no permissions for [cluster:admin/opendistro/alerting/alerts/get] and User [name=userOne, backend_roles=[], requestedTenant=null]"},"status":403}
        at __randomizedtesting.SeedInfo.seed([99E6A35FD2F66DD0:5C4EF4847D415D67]:0)
        at org.opensearch.client.RestClient.convertResponse(RestClient.java:344)
        at org.opensearch.client.RestClient.performRequest(RestClient.java:314)
        at org.opensearch.client.RestClient.performRequest(RestClient.java:289)
        at org.opensearch.alerting.TestHelpersKt.makeRequest(TestHelpers.kt:478)
        at org.opensearch.alerting.AlertingRestTestCase.getAlerts(AlertingRestTestCase.kt:512)
        at org.opensearch.alerting.AlertingRestTestCase.getAlerts$default(AlertingRestTestCase.kt:505)
        at org.opensearch.alerting.resthandler.SecureMonitorRestApiIT.test query all alerts in all states with disabled filter by(SecureMonitorRestApiIT.kt:632)
  2> febr. 10, 2022 6:20:22 DU org.opensearch.client.RestClient logResponse
bbarani commented 2 years ago

@qreshi @lezzago can you provide updates on this issue?

qreshi commented 2 years ago

@ohltyler Yeah https://github.com/opensearch-project/alerting/pull/294 is merged in now so any of those other security test errors you're seeing might be jenkins specific situations since we haven't really seen them on our side.

As for test bucket-level monitor with acknowledged alert, I'll work on trying to get a fix for that this week and then we can close this issue out.

bbarani commented 2 years ago

@qreshi Any update on this issue? This issue is being tracked as part of build as well.

qreshi commented 2 years ago

@bbarani All known flaky tests have been fixed (at least in main which is on 1.3.0). Is the expectation to back port these to 1.2.0 as well? We don't typically do that but the tracking issue mentioned 1.2.0 as well.