opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
157 stars 123 forks source link

BWC tests failing for org.opensearch.knn.bwc.IndexingIT.testKNNDefaultIndexSettings #1622

Open jmazanec15 opened 7 months ago

jmazanec15 commented 7 months ago

Description

Looks like BWC tests are failing (#1620 ). Unsure yet if its flaky, or caused by recent change. Needs investigation.

Test is org.opensearch.knn.bwc.IndexingIT.testKNNDefaultIndexSettings.

Failure looks like:

REPRODUCE WITH: ./gradlew ':qa:rolling-upgrade:testAgainstOneThirdUpgradedCluster' --tests "org.opensearch.knn.bwc.IndexingIT.testKNNDefaultIndexSettings" -Dtests.seed=4E6108AECADC0863 -Dtests.security.manager=false -Dtests.bwc.version=2.14.0-SNAPSHOT -Dtests.locale=es-SV -Dtests.timezone=Pacific/Port_Moresby -Druntime.java=17
> Task :qa:rolling-upgrade:testAgainstOneThirdUpgradedCluster

org.opensearch.knn.bwc.IndexingIT > testKNNDefaultIndexSettings FAILED
    org.opensearch.client.ResponseException: method [POST], host [http://[::1]:39553], URI [/knn-bwc-testknndefaultindexsettings/_search?explain=true&size=5&search_type=query_then_fetch], status line [HTTP/1.1 500 Internal Server Error]
    {"error":{"root_cause":[{"type":"illegal_state_exception","reason":"unexpected byte [0x05]"}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query","grouped":true,"failed_shards":[{"shard":0,"index":"knn-bwc-testknndefaultindexsettings","node":"wDLPdQo-TMWQOWvfiZYk8A","reason":{"type":"illegal_state_exception","reason":"unexpected byte [0x05]"}}],"caused_by":{"type":"illegal_state_exception","reason":"unexpected byte [0x05]","caused_by":{"type":"illegal_state_exception","reason":"unexpected byte [0x05]"}}},"status":500}
        at __randomizedtesting.SeedInfo.seed([4E6108AECADC0863:AAEB4444A0E50D13]:0)
        at app//org.opensearch.client.RestClient.convertResponse(RestClient.java:385)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:355)
        at app//org.opensearch.client.RestClient.performRequest(RestClient.java:330)
        at app//org.opensearch.knn.KNNRestTestCase.searchKNNIndex(KNNRestTestCase.java:199)
        at app//org.opensearch.knn.KNNRestTestCase.validateKNNSearch(KNNRestTestCase.java:1118)
        at app//org.opensearch.knn.bwc.IndexingIT.validateKNNIndexingOnUpgrade(IndexingIT.java:131)
        at app//org.opensearch.knn.bwc.IndexingIT.testKNNDefaultIndexSettings(IndexingIT.java:38)

  1> [2024-04-18T08:10:29,177][INFO ][o.o.k.b.IndexingIT       ] [testKNNIndexCreation_withMethodMapper] before test
  1> [2024-04-18T08:10:29,181][INFO ][o.o.k.b.IndexingIT       ] [testKNNIndexCreation_withMethodMapper] initializing REST clients against [http://[::1]:40895, http://127.0.0.1:45895, http://[::1]:39553, http://127.0.0.1:40941, http://[::1]:33243, http://127.0.0.1:35707]
  1> [2024-04-18T08:10:29,916][INFO ][o.o.k.b.IndexingIT       ] [testKNNIndexCreation_withMethodMapper] after test
  1> [2024-04-18T08:10:29,923][INFO ][o.o.k.b.IndexingIT       ] [testKNNDefaultIndexSettings] before test

...
 »  Caused by: java.lang.IllegalStateException: unexpected byte [0x05]
»   at org.opensearch.core.common.io.stream.StreamInput.readBoolean(StreamInput.java:593) ~[opensearch-core-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»   at org.opensearch.core.common.io.stream.StreamInput.readBoolean(StreamInput.java:583) ~[opensearch-core-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»   at org.opensearch.search.builder.SearchSourceBuilder.<init>(SearchSourceBuilder.java:251) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»   at org.opensearch.core.common.io.stream.StreamInput.readOptionalWriteable(StreamInput.java:977) ~[opensearch-core-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»   at org.opensearch.search.internal.ShardSearchRequest.<init>(ShardSearchRequest.java:244) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»   at org.opensearch.transport.RequestHandlerRegistry.newRequest(RequestHandlerRegistry.java:87) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»   at org.opensearch.transport.NativeMessageHandler.newRequest(NativeMessageHandler.java:309) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»   at org.opensearch.transport.NativeMessageHandler.handleRequest(NativeMessageHandler.java:264) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»   at org.opensearch.transport.NativeMessageHandler.handleMessage(NativeMessageHandler.java:139) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»   at org.opensearch.transport.NativeMessageHandler.messageReceived(NativeMessageHandler.java:119) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»   at org.opensearch.transport.InboundHandler.messageReceivedFromPipeline(InboundHandler.java:108) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»   at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:100) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»   at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:784) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»   at org.opensearch.transport.nativeprotocol.NativeInboundBytesHandler.forwardFragments(NativeInboundBytesHandler.java:157) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»   at org.opensearch.transport.nativeprotocol.NativeInboundBytesHandler.doHandleBytes(NativeInboundBytesHandler.java:94) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»   at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:143) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»   at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:119) ~[opensearch-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
»   at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:95) ~[?:?]
»   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[?:?]
»   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
»   at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
»   at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) ~[?:?]
»   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[?:?]
»   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
»   at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
»   at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[?:?]
»   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[?:?]
»   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
»   at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[?:?]
»   at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[?:?]
»   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[?:?]
»   at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[?:?]
»   at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[?:?]
»   at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[?:?]
»   at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) ~[?:?]
»   at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) ~[?:?]
»   at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) ~[?:?]
»   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[?:?]
»   at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[?:?]
»   at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
»   at java.lang.Thread.run(Thread.java:840) ~[?:?]
jmazanec15 commented 7 months ago

https://github.com/opensearch-project/k-NN/actions/runs/8729447278/job/23951401849?pr=1499

jmazanec15 commented 3 months ago

Something is happening with the stream. See

»   at org.opensearch.core.common.io.stream.StreamInput.readOptionalWriteable(StreamInput.java:977) ~[opensearch-core-2.14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]

~Im guessing its around this: https://github.com/opensearch-project/k-NN/blob/2.14.0.0/src/main/java/org/opensearch/knn/index/query/KNNQueryBuilder.java#L221.~ NVM

The read is happening on 2.14 side, so the issue is writing from 3.0 -> 2.14. Unable to reproduce so far.

jmazanec15 commented 3 months ago

From searchsourcebuilder its failing here: https://github.com/opensearch-project/OpenSearch/blob/bcd2d8a5197e7aa4a9239d76fbedc85d3c554da8/server/src/main/java/org/opensearch/search/builder/SearchSourceBuilder.java#L251