opensearch-project / cross-cluster-replication

Synchronize your data across multiple clusters for lower latencies and higher availability
https://opensearch.org/docs/latest/replication-plugin/index/
Apache License 2.0
47 stars 57 forks source link

[BUG] 'integTestRemote' setup failing #1361

Closed khushbr closed 1 month ago

khushbr commented 5 months ago

What is the bug? integTestRemote cluster setup is failing the integration tests, including UpdateAutoFollowPatternIT, StartReplicationIT and others. Sample stack trace shared below, see complete list of failures at https://build.ci.opensearch.org/blue/rest/organizations/jenkins/pipelines/integ-test/runs/8074/nodes/122/steps/672/log/?start=0

java.net.ConnectException: Connection refused

        at org.opensearch.client.RestClient.extractAndWrapCause(RestClient.java:954)

        at org.opensearch.client.RestClient.performRequest(RestClient.java:333)

        at org.opensearch.client.RestClient.performRequest(RestClient.java:321)

        at org.opensearch.replication.MultiClusterRestTestCase.stopAllReplicationJobs(MultiClusterRestTestCase.kt:422)

        at org.opensearch.replication.MultiClusterRestTestCase.wipeIndicesFromCluster(MultiClusterRestTestCase.kt:446)

        at org.opensearch.replication.MultiClusterRestTestCase.wipeCluster(MultiClusterRestTestCase.kt:372)

        at org.opensearch.replication.MultiClusterRestTestCase.wipeClusters(MultiClusterRestTestCase.kt:367)

        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)

        at java.base/java.lang.reflect.Method.invoke(Method.java:580)

        at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)

        at com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)

        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)

        at org.junit.rules.RunRules.evaluate(RunRules.java:20)

        at org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:48)

        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)

        at org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)

        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)

        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)

        at org.junit.rules.RunRules.evaluate(RunRules.java:20)

        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)

        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)

        at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)

        at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)

        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)

        at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)

        at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)

        at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)

        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)

        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)

        at org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)

        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)

        at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)

        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)

        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)

        at org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)

        at org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)

        at org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)

        at org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)

        at org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)

        at org.junit.rules.RunRules.evaluate(RunRules.java:20)

        at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)

        at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)

        at java.base/java.lang.Thread.run(Thread.java:1583)

        Caused by:

        java.net.ConnectException: Connection refused

            at java.base/sun.nio.ch.Net.pollConnect(Native Method)

            at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:682)

            at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:973)

            at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:174)

            at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:148)

            at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351)

            at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221)

            at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)

            ... 1 more

How can one reproduce the bug? Steps to reproduce the behavior: REPRODUCE WITH: ./gradlew ':integTestRemote' --tests "org.opensearch.replication.integ.rest.StartReplicationIT.test that replication cannot be started on leader alias directly" -Dtests.seed=EAF3B91F9321D5FF -Dtests.security.manager=true -Dtests.locale=en-SG -Dtests.timezone=Asia/Yekaterinburg -Druntime.java=21

What is the expected behavior? All the Integ test should pass.

dblock commented 2 months ago

Catch All Triage - 1 2 3 4 5 6

@khushbr Has this been fully fixed in the related PRs? Close?

nisgoel-amazon commented 1 month ago

This is fixed, can someone close this issue @monusingh-1