opensearch-project / asynchronous-search

:arrow_forward: Run queries in the background and retrieve partial results along the way
https://opensearch.org/docs/latest/search-plugins/async/index/
Apache License 2.0
29 stars 49 forks source link

[BUG] linux-test-docker integTests failing on remote cluster #667

Open finnegancarroll opened 1 week ago

finnegancarroll commented 1 week ago

What is the bug?

Testing locally the integTestRemote task passes but integTest does not. integTestRemote filters only for rest tests so there is maybe a silent failure here?

How can one reproduce the bug? Testing with:

export plugin=opensearch-asynchronous-search-3.0.0.0-SNAPSHOT.zip
export version=3.0.0
export plugin_version=3.0.0.0
export qualifier=
export docker_version=3.0.0

echo "FROM opensearchstaging/opensearch:$docker_version" >> Dockerfile
echo "RUN if [ -d /usr/share/opensearch/plugins/opensearch-asynchronous-search ]; then /usr/share/opensearch/bin/opensearch-plugin remove opensearch-asynchronous-search; fi" >> Dockerfile
echo "ADD $plugin /tmp/" >> Dockerfile
echo "RUN /usr/share/opensearch/bin/opensearch-plugin install --batch file:/tmp/$plugin" >> Dockerfile

docker build -t opensearch-asynchronous-search:test .

docker run \
-p 9200:9200 \
-p 9600:9600 \
-d \
-e OPENSEARCH_INITIAL_ADMIN_PASSWORD=myStrongPassword123! \
-e discovery.type=single-node \
opensearch-asynchronous-search:test

./gradlew integTest \
  -Dtests.rest.cluster=localhost:9200 \
  -Dtests.cluster=localhost:9200 \
  -Dtests.clustername="docker-cluster" \
  -Dhttps=true \
  -Duser=admin \
  -Dpassword=myStrongPassword123!

Test failures:

Tests with failures:
 - org.opensearch.search.asynchronous.integTests.AsynchronousSearchQueryIT.testEmptyQueryString
 - org.opensearch.search.asynchronous.integTests.AsynchronousSearchQueryIT.testHighlighterQuery
 - org.opensearch.search.asynchronous.integTests.AsynchronousSearchQueryIT.testIpRangeQuery
 - org.opensearch.search.asynchronous.integTests.AsynchronousSearchQueryIT.testAggregationQuery
 - org.opensearch.search.asynchronous.integTests.AsynchronousSearchRejectionIT.testSimulatedSearchRejectionLoad
 - org.opensearch.search.asynchronous.integTests.AsynchronousSearchRejectionIT.testSearchFailures
 - org.opensearch.search.asynchronous.integTests.AsynchronousSearchTaskCancellationIT.testCancellationDuringQueryPhase
 - org.opensearch.search.asynchronous.integTests.AsynchronousSearchTaskCancellationIT.testCancellationDuringFetchPhase
 - org.opensearch.search.asynchronous.listener.AsynchronousSearchCancellationIT.testCancellationDuringQueryPhase
 - org.opensearch.search.asynchronous.listener.AsynchronousSearchCancellationIT.testCancellationDuringFetchPhase
 - org.opensearch.search.asynchronous.listener.AsynchronousSearchPartialResponseIT.testPartialReduceBuckets
 - org.opensearch.search.asynchronous.management.AsynchronousSearchManagementServiceIT.testCleansUpExpiredAsynchronousSearchDuringQueryPhase
 - org.opensearch.search.asynchronous.management.AsynchronousSearchManagementServiceIT.testDeletesExpiredAsynchronousSearchResponseFromPersistedStore
 - org.opensearch.search.asynchronous.management.AsynchronousSearchManagementServiceIT.testCleansUpExpiredAsynchronousSearchDuringFetchPhase
 - org.opensearch.search.asynchronous.request.AsynchronousSearchRequestRoutingIT.testRequestForwardingToCoordinatorNodeForPersistedAsynchronousSearch
 - org.opensearch.search.asynchronous.request.AsynchronousSearchRequestRoutingIT.testRequestForwardingToCoordinatorNodeForRunningAsynchronousSearch
 - org.opensearch.search.asynchronous.request.AsynchronousSearchRequestRoutingIT.testInvalidIdRequestHandling

Often with 'failed to connect' errors which appear transport related:

  1> org.opensearch.transport.ConnectTransportException: [node_s2][127.0.0.1:44359] connect_exception
  1>    at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1106) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.core.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:217) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:57) ~[opensearch-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
  1>    at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
  1>    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
  1>    at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194) ~[?:?]
  1>    at org.opensearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:72) ~[opensearch-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.SocketChannelContext.connect(SocketChannelContext.java:160) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.EventHandler.handleConnect(EventHandler.java:130) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.transport.nio.TestEventHandler.handleConnect(TestEventHandler.java:139) ~[framework-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.attemptConnect(NioSelector.java:446) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.registerChannel(NioSelector.java:469) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.setUpNewChannels(NioSelector.java:458) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.preSelect(NioSelector.java:279) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.singleLoop(NioSelector.java:172) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.runLoop(NioSelector.java:148) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
  1> Caused by: java.net.ConnectException: Connection refused
  1>    at java.base/sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
  1>    at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:682) ~[?:?]
  1>    at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:973) ~[?:?]
  1>    at org.opensearch.nio.SocketChannelContext.connect(SocketChannelContext.java:157) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    ... 9 more
  1> [2024-11-06T23:20:29,284][WARN ][o.o.c.NodeConnectionsService] [node_s4] failed to connect to {node_s3}{7lJLTNV_SMeqFgtL1SJsew}{DWMlPFFURwiVGq46clX-QQ}{127.0.0.1}{127.0.0.1:45207}{dimr}{shard_indexing_pressure_enabled=true} (tried [1] times)
  1> org.opensearch.transport.ConnectTransportException: [node_s3][127.0.0.1:45207] connect_exception
  1>    at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1106) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.core.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:217) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:57) ~[opensearch-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
  1>    at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
  1>    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
  1>    at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194) ~[?:?]
  1>    at org.opensearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:72) ~[opensearch-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.SocketChannelContext.connect(SocketChannelContext.java:160) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.EventHandler.handleConnect(EventHandler.java:130) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.transport.nio.TestEventHandler.handleConnect(TestEventHandler.java:139) ~[framework-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.attemptConnect(NioSelector.java:446) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.registerChannel(NioSelector.java:469) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.setUpNewChannels(NioSelector.java:458) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.preSelect(NioSelector.java:279) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.singleLoop(NioSelector.java:172) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.runLoop(NioSelector.java:148) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
  1> Caused by: java.net.ConnectException: Connection refused
  1>    at java.base/sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
  1>    at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:682) ~[?:?]
  1>    at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:973) ~[?:?]
  1>    at org.opensearch.nio.SocketChannelContext.connect(SocketChannelContext.java:157) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    ... 9 more
  1> [2024-11-06T23:20:29,285][WARN ][o.o.c.NodeConnectionsService] [node_s5] failed to connect to {node_s3}{7lJLTNV_SMeqFgtL1SJsew}{DWMlPFFURwiVGq46clX-QQ}{127.0.0.1}{127.0.0.1:45207}{dimr}{shard_indexing_pressure_enabled=true} (tried [1] times)
  1> org.opensearch.transport.ConnectTransportException: [node_s3][127.0.0.1:45207] connect_exception
  1>    at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1106) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.core.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:217) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:57) ~[opensearch-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
  1>    at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
  1>    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
  1>    at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194) ~[?:?]
  1>    at org.opensearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:72) ~[opensearch-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.SocketChannelContext.connect(SocketChannelContext.java:160) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.EventHandler.handleConnect(EventHandler.java:130) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.transport.nio.TestEventHandler.handleConnect(TestEventHandler.java:139) ~[framework-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.attemptConnect(NioSelector.java:446) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.registerChannel(NioSelector.java:469) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.setUpNewChannels(NioSelector.java:458) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.preSelect(NioSelector.java:279) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.singleLoop(NioSelector.java:172) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.runLoop(NioSelector.java:148) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
  1> Caused by: java.net.ConnectException: Connection refused
  1>    at java.base/sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
  1>    at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:682) ~[?:?]
  1>    at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:973) ~[?:?]
  1>    at org.opensearch.nio.SocketChannelContext.connect(SocketChannelContext.java:157) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    ... 9 more
  1> [2024-11-06T23:20:29,285][WARN ][o.o.c.NodeConnectionsService] [node_s5] failed to connect to {node_s2}{jkL01aTcRJu2tQDHvKWbyg}{0LFBkj48Q9G2NAG2z50obg}{127.0.0.1}{127.0.0.1:44359}{dimr}{shard_indexing_pressure_enabled=true} (tried [1] times)
  1> org.opensearch.transport.ConnectTransportException: [node_s2][127.0.0.1:44359] connect_exception
  1>    at org.opensearch.transport.TcpTransport$ChannelsConnectedListener.onFailure(TcpTransport.java:1106) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.core.action.ActionListener.lambda$toBiConsumer$2(ActionListener.java:217) ~[opensearch-core-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.common.concurrent.CompletableContext.lambda$addListener$0(CompletableContext.java:57) ~[opensearch-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863) ~[?:?]
  1>    at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841) ~[?:?]
  1>    at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510) ~[?:?]
  1>    at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194) ~[?:?]
  1>    at org.opensearch.common.concurrent.CompletableContext.completeExceptionally(CompletableContext.java:72) ~[opensearch-common-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.SocketChannelContext.connect(SocketChannelContext.java:160) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.EventHandler.handleConnect(EventHandler.java:130) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.transport.nio.TestEventHandler.handleConnect(TestEventHandler.java:139) ~[framework-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.attemptConnect(NioSelector.java:446) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.registerChannel(NioSelector.java:469) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.setUpNewChannels(NioSelector.java:458) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.preSelect(NioSelector.java:279) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.singleLoop(NioSelector.java:172) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at org.opensearch.nio.NioSelector.runLoop(NioSelector.java:148) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
  1>    at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
  1> Caused by: java.net.ConnectException: Connection refused
  1>    at java.base/sun.nio.ch.Net.pollConnect(Native Method) ~[?:?]
  1>    at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:682) ~[?:?]
  1>    at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:973) ~[?:?]
  1>    at org.opensearch.nio.SocketChannelContext.connect(SocketChannelContext.java:157) ~[opensearch-nio-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]

What is the expected behavior? Integ tests pass.