Closed martin-gaievski closed 5 days ago
tried one more set of steps that involve custom analyzer for korean language `nori' https://mvnrepository.com/artifact/org.opensearch.plugin/analysis-nori and much simpler query:
GET /bedrock-rag-chunk-500/_search
{
"_source": {
"excludes": "body_chunk_embedding"
},
"search_pipeline": {
"description": "Post processor for hybrid search",
"phase_results_processors": [
{
"normalization-processor": {
"normalization": {
"technique": "min_max"
},
"combination": {
"technique": "arithmetic_mean",
"parameters": {
"weights": [
0.3,
0.7
]
}
}
}
}
]
},
"query": {
"hybrid": {
"queries": [
{
"multi_match": {
"query": "금융분야 클라우드컴퓨팅서비스 이용 가이드에 따르면, 중대한 변경이나 중요 계약사항 미이행 시 사후보고는 어떻게 해야 하며, 비중요업무의 경우 어떤 규정을 따라야 합니까?",
"fields": [
"body_chunk_default",
"summary",
"title"
]
}
},
{
"neural": {
"body_chunk_embedding": {
"query_text": "금융분야 클라우드컴퓨팅서비스 이용 가이드에 따르면, 중대한 변경이나 중요 계약사항 미이행 시 사후보고는 어떻게 해야 하며, 비중요업무의 경우 어떤 규정을 따라야 합니까?",
"model_id": "ZSPJiJIBFB9Oq1h87Iid",
"k": 5
}
}
}
]
}
}
}
got same type of response, and found following stacktrace in the log
[2024-10-23T23:55:28,952][WARN ][r.suppressed ] [a2999586b1baa510e1c3a4e65c95804b] path: __PATH__ params: {pretty=true, index=bedrock-rag-chunk-500}
Failed to execute phase [query], all shards failed; shardFailures {[etHmJm3BTbe1VC3k1CahCA][bedrock-rag-chunk-500][0]: RemoteTransportException[[4ffd32c623b597fabb2e803c1335c651][__IP__][__PATH__[__PATH__]]]; nested: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: EOFException[read past EOF (pos=2147483647): MemorySegmentIndexInput(__PATH__) [slice=_0.nvd] [slice=randomaccess]]; }{[etHmJm3BTbe1VC3k1CahCA][bedrock-rag-chunk-500][1]: RemoteTransportException[[4ffd32c623b597fabb2e803c1335c651][__IP__][__PATH__[__PATH__]]]; nested: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: EOFException[read past EOF (pos=2147483647): MemorySegmentIndexInput(__PATH__) [slice=_0.nvd] [slice=randomaccess]]; }{[etHmJm3BTbe1VC3k1CahCA][bedrock-rag-chunk-500][2]: RemoteTransportException[[4ffd32c623b597fabb2e803c1335c651][__IP__][__PATH__[__PATH__]]]; nested: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: EOFException[read past EOF (pos=2147483647): MemorySegmentIndexInput(__PATH__) [slice=_0.nvd] [slice=randomaccess]]; }{[etHmJm3BTbe1VC3k1CahCA][bedrock-rag-chunk-500][3]: RemoteTransportException[[4ffd32c623b597fabb2e803c1335c651][__IP__][__PATH__[__PATH__]]]; nested: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: EOFException[read past EOF (pos=2147483647): MemorySegmentIndexInput(__PATH__) [slice=_0.nvd] [slice=randomaccess]]; }{[etHmJm3BTbe1VC3k1CahCA][bedrock-rag-chunk-500][4]: RemoteTransportException[[4ffd32c623b597fabb2e803c1335c651][__IP__][__PATH__[__PATH__]]]; nested: QueryPhaseExecutionException[Query Failed [Failed to execute main query]]; nested: EOFException[read past EOF (pos=2147483647): MemorySegmentIndexInput(__PATH__) [slice=_0.nvd] [slice=randomaccess]]; }
at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:780)
at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:397)
at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:820)
at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:558)
at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:318)
at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104)
at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:75)
at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:764)
at org.opensearch.transport.TransportService$9.handleException(TransportService.java:1729)
at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleException(SecurityInterceptor.java:436)
at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1515)
at org.opensearch.transport.NativeMessageHandler.lambda$handleException$5(NativeMessageHandler.java:454)
at org.opensearch.common.util.concurrent.OpenSearchExecutors$DirectExecutorService.execute(OpenSearchExecutors.java:412)
at org.opensearch.transport.NativeMessageHandler.handleException(NativeMessageHandler.java:452)
at org.opensearch.transport.NativeMessageHandler.handlerResponseError(NativeMessageHandler.java:444)
at org.opensearch.transport.NativeMessageHandler.handleMessage(NativeMessageHandler.java:172)
at org.opensearch.transport.NativeMessageHandler.messageReceived(NativeMessageHandler.java:126)
at org.opensearch.transport.InboundHandler.messageReceivedFromPipeline(InboundHandler.java:121)
at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:113)
at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:800)
at org.opensearch.transport.nativeprotocol.NativeInboundBytesHandler.forwardFragments(NativeInboundBytesHandler.java:157)
at org.opensearch.transport.nativeprotocol.NativeInboundBytesHandler.doHandleBytes(NativeInboundBytesHandler.java:94)
at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:143)
at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:119)
at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:95)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475)
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1407)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:918)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:994)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at __PATH__(Thread.java:1583)
Caused by: OpenSearchException[read past EOF (pos=2147483647): MemorySegmentIndexInput(__PATH__) [slice=_0.nvd] [slice=randomaccess]]; nested: EOFException[read past EOF (pos=2147483647): MemorySegmentIndexInput(__PATH__) [slice=_0.nvd] [slice=randomaccess]];
at org.opensearch.OpenSearchException.guessRootCauses(OpenSearchException.java:710)
at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:395)
... 51 more
Caused by: java.io.EOFException: read past EOF (pos=2147483647): MemorySegmentIndexInput(__PATH__) [slice=_0.nvd] [slice=randomaccess]
at org.apache.lucene.store.MemorySegmentIndexInput.handlePositionalIOOBE(MemorySegmentIndexInput.java:100)
at org.apache.lucene.store.MemorySegmentIndexInput$SingleSegmentImpl.readByte(MemorySegmentIndexInput.java:543)
at org.apache.lucene.codecs.lucene90.Lucene90NormsProducer$3.longValue(Lucene90NormsProducer.java:389)
at org.apache.lucene.search.LeafSimScorer.getNormValue(LeafSimScorer.java:47)
at org.apache.lucene.search.LeafSimScorer.score(LeafSimScorer.java:60)
at org.apache.lucene.search.TermScorer.score(TermScorer.java:86)
at org.apache.lucene.search.DisjunctionSumScorer.score(DisjunctionSumScorer.java:41)
at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:178)
at org.apache.lucene.search.DisjunctionMaxScorer.score(DisjunctionMaxScorer.java:65)
at org.apache.lucene.search.DisjunctionScorer.score(DisjunctionScorer.java:178)
at org.opensearch.neuralsearch.query.HybridQueryScorer.hybridScores(HybridQueryScorer.java:193)
at org.opensearch.neuralsearch.search.collector.HybridTopScoreDocCollector$1.collect(HybridTopScoreDocCollector.java:100)
at org.apache.lucene.search.MultiCollector$MultiLeafCollector.collect(MultiCollector.java:226)
at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:296)
at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:236)
at org.opensearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:71)
at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:38)
at org.opensearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:327)
at org.opensearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:283)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:552)
at org.opensearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:361)
at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWithCollector(QueryPhase.java:468)
at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher$DefaultQueryPhaseSearcherWithEmptyQueryCollectorContext.searchWithCollector(HybridQueryPhaseSearcher.java:199)
at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:438)
at org.opensearch.neuralsearch.search.query.HybridQueryPhaseSearcher.searchWith(HybridQueryPhaseSearcher.java:65)
at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:284)
at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:157)
at org.opensearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:589)
at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:653)
at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:622)
at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:74)
at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:89)
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:950)
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.lang.Thread.run(Thread.java:1583)
few things to notice:
@martin-gaievski Can you share the index configuration you used for reproducing the bug?
@vibrantvarun yes, I'm using following config for index and ingest pipeline:
PUT /bedrock-rag-chunk-500
{
"settings": {
"default_pipeline": "bedrock-v2-ingest-pipeline",
"index": {
"knn": true
},
"analysis": {
"tokenizer": {
"nori_sample_dict": {
"type": "nori_tokenizer",
"decompound_mode": "mixed",
"user_dictionary_rules": [
"c++",
"java",
"python"
]
}
},
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
},
"pos_f": {
"type": "nori_part_of_speech",
"stoptags": [
"SP"
,"SF"
,"VCP"
,"NP"
]
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "nori_sample_dict",
"filter": [
"lowercase",
"stop",
"pos_f",
"autocomplete_filter"
],
"char_filter": [
"html_strip"
]
}
}
}
},
"mappings": {
"properties": {
"title" :{
"type": "text",
"analyzer": "autocomplete"
},
"summary" :{
"type": "text",
"analyzer": "autocomplete"
},
"auditor" :{
"type": "keyword",
"null_value": "deactivate"
},
"body_chunk_default": {
"type": "text",
"analyzer": "autocomplete"
},
"body_chunk_embedding":{
"type": "knn_vector",
"dimension": 1536,
"method": {
"engine": "faiss",
"space_type": "l2",
"name": "hnsw"
}
}
}
}
}
pipeline
PUT /_ingest/pipeline/bedrock-v2-ingest-pipeline
{
"description": "Bedrock Titan v2 Embedding pipeline",
"processors": [
{
"text_embedding": {
"model_id": "<model_id>",
"field_map": {
"body_chunk_default": "body_chunk_embedding"
}
}
}
]
}
I'm using bedrock text embeddings model with 1536 dimensions, but I suspect that's not a critical piece, you should see the issue with any other model
Following steps can be taken to reproduce issue in local cluster using open source version and local model. It involves some manual tweaks related to manual installation of nori
analyzer plugin. Credit goes to @vibrantvarun for finding those steps
This is developer setup, you need to have neural-search repository locally, check Dev guide for details
Getting artifacts for nori
plugin
Download nori analyzer plugin by using following link. (Replace the version you need)
https://ci.opensearch.org/ci/dbc/distribution-build-opensearch/3.0.0/latest/linux/x64/tar/builds/opensearch/core-plugins/analysis-icu-3.0.0.zip. Extract content of that zip archive, e.g. local folder is /Users/user/dev/opensearch/libs
.
in neural-search local folder add flatDir
section under repositories
to repositories.gradle
file, path should point to that local folder created in the previous step:
repositories {
mavenLocal()
maven { url "https://aws.oss.sonatype.org/content/repositories/snapshots" }
mavenCentral()
maven { url "https://plugins.gradle.org/m2/" }
flatDir {
dirs "/Users/user/dev/opensearch/libs"
}
}
make following changes in build.gradle
file
def knnJarDirectory = "$buildDir/dependencies/opensearch-knn"
def analysisDirectory="$buildDir/dependencies/analysis-nori" //added line
dependencies {
api "org.opensearch:opensearch:${opensearch_version}"
zipArchive group: 'org.opensearch.plugin', name:'opensearch-knn', version: "${opensearch_build}"
zipArchive group: 'org.opensearch.plugin', name:'opensearch-ml-plugin', version: "${opensearch_build}"
zipArchive name:'analysis-nori', version: "${version_tokens[0]}", ext:'zip' //added line
secureIntegTestPluginArchive group: 'org.opensearch.plugin', name:'opensearch-security', version: "${opensearch_build}"
compileOnly fileTree(dir: knnJarDirectory, include: "opensearch-knn-${opensearch_build}.jar")
compileOnly fileTree(dir: analysisDirectory, include: "analysis-nori-${opensearch_build}.jar") //added line
compileOnly group: 'com.google.guava', name: 'guava', version:'32.1.3-jre'
gradle run
, it should start dev cluster locally. If it's failing review steps again, most probably one of file paths or variables not defined correctly.Step to reproduce error
create index
PUT {{base_url}}/rag-chunk-500
{
"settings": {
"default_pipeline": "ingest-pipeline",
"number_of_shards": 5,
"number_of_replicas": 2,
"index": {
"knn": true
},
"analysis": {
"tokenizer": {
"nori_sample_dict": {
"type": "nori_tokenizer",
"decompound_mode": "mixed",
"user_dictionary_rules": [
"c++",
"java",
"python"
]
}
},
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 20
},
"pos_f": {
"type": "nori_part_of_speech",
"stoptags": [
"SP"
,"SF"
,"VCP"
,"NP"
]
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "nori_sample_dict",
"filter": [
"lowercase",
"stop",
"pos_f",
"autocomplete_filter"
],
"char_filter": [
"html_strip"
]
}
}
}
},
"mappings": {
"properties": {
"title" :{
"type": "text",
"analyzer": "autocomplete"
},
"summary" :{
"type": "text",
"analyzer": "autocomplete"
},
"auditor" :{
"type": "keyword",
"null_value": "deactivate"
},
"body_chunk_default": {
"type": "text",
"analyzer": "autocomplete"
},
"body_chunk_embedding":{
"type": "knn_vector",
"dimension": 384,
"method": {
"engine": "lucene",
"space_type": "l2",
"name": "hnsw"
}
}
}
}
}
update cluster settings
PUT {{base_url}}/_cluster/settings
{
"persistent":{
"plugins.ml_commons.native_memory_threshold": 99,
"plugins.ml_commons.only_run_on_ml_node": false,
"plugins.ml_commons.model_access_control_enabled": true
}
}
create model group
POST {{base_url}}/_plugins/_ml/model_groups/_register
{
"name": "test_model_group_public",
"description": "This is a public model group"
}
{ "model_group_id": "T3o-ZJMB_dO27nZ9taIf", "status": "CREATED" }
4. register model
POST {{base_url}}/_plugins/_ml/models/_register?deploy=true { "name": "huggingface/sentence-transformers/all-MiniLM-L12-v2", "version": "1.0.1", "model_format": "TORCH_SCRIPT", "model_group_id": "T3o-ZJMB_dO27nZ9taIf" }
5. deploy model
POST {{base_url}}/_plugins/_ml/models/UXo_ZJMB_dO27nZ9IKL8/_deploy
6. create ingest pipeline
PUT {{base_url}}/_ingest/pipeline/ingest-pipeline { "description": "An NLP ingest pipeline", "processors": [ { "text_embedding": { "model_id": "UXo_ZJMB_dO27nZ9IKL8", "field_map": { "body_chunk_default": "body_chunk_embedding" } } } ] }
7. ingest docs in bulk
POST {{base_url}}/rag-chunk-500/_bulk?refresh
[bulk_doc_ingest.json](https://github.com/user-attachments/files/17906614/bulk_doc_ingest.json)
8. create search pipeline
PUT {{base_url}}/_search/pipeline/nlp-search-pipeline { "description": "Post processor for hybrid search", "phase_results_processors": [ { "normalization-processor": { "normalization": { "technique": "min_max" }, "combination": { "technique": "arithmetic_mean", "parameters": {} } } } ] }
9. run hybrid search query
GET {{base_url}}/rag-chunk-500/_search?search_pipeline=nlp-search-pipeline { "_source": { "excludes": "body_chunk_embedding" }, "query": { "hybrid": { "queries": [ { "multi_match": { "query": "금융분야 클라우드컴퓨팅서비스 이용 가이드에 따르면, 중대한 변경이나 중요 계약사항 미이행 시 사후보고는 어떻게 해야 하며, 비중요업무의 경우 어떤 규정을 따라야 합니까?", "fields": [ "body_chunk_default", "summary", "title" ] } }, { "neural": { "body_chunk_embedding": { "query_text": "금융분야 클라우드컴퓨팅서비스 이용 가이드에 따르면, 중대한 변경이나 중요 계약사항 미이행 시 사후보고는 어떻게 해야 하며, 비중요업무의 경우 어떤 규정을 따라야 합니까?", "model_id": "UXo_ZJMB_dO27nZ9IKL8", "k": 5 } } } ] } } }
You'll get the response with EOF error, something like
"took": 136, "timed_out": false, "_shards": { "total": 5, "successful": 4, "skipped": 0, "failed": 1, "failures": [ { "shard": 2, "index": "rag-chunk-500", "node": "9rL-bxGHQJqkPBrIOhYlyw", "reason": { "type": "e_o_f_exception", "reason": """read past EOF (pos=2147483647): MemorySegmentIndexInput(path="/hdd1/mnt/env/root/apollo/env/swift-us-east-1-prod-OS_2_15AMI-ES2-p001/var/es/data/nodes/0/indices/cmDxLCz4S8WvwnBENx_S0g/2/index/_b.nvd") [slice=randomaccess]""" } } ] } ...
Fix has been merged to 2.x and main, we're on track for 2.19
What is the bug?
When running hybrid query with a certain complex structure then response contains failures for some shards:
it may be the issue similar to one reposted for 2.13: https://github.com/opensearch-project/neural-search/issues/621
How can one reproduce the bug?
for index with text based documents run a hybrid query with a following structure:
below is example of response :
What is your host/environment?
We tested in OS 2.15 and latest main. Same query works correctly in 2.13
Do you have any additional context?
Looks like issue depends on number of documents that are present in the index. Same query works fine with ~20K documents but start failing with 40K documents. My guess it's related to some optimization techniques in one of sub-queries.