opensearch-project / k-NN

🆕 Find the k-nearest neighbors (k-NN) for your vector data
https://opensearch.org/docs/latest/search-plugins/knn/index/
Apache License 2.0
147 stars 112 forks source link

[Bug]: refresh call answering 404 #1037

Closed annassommer closed 1 year ago

annassommer commented 1 year ago

Describe the bug

Hello, I am using opensearch together with langchain in a node project. Using langchain I have been trying to set up a vector store using the fromDocuments function: vectorStore = await OpenSearchVectorStore.fromDocuments( docOutputList, new OpenAIEmbeddings({batchSize:1, maxConcurrency: 1}), { client, indexName: process.env.OPENSEARCH_INDEX || opensearchIndex, } ) This function however always throws a 404 error without further information: ResponseError: Response Error at onBody (/var/task/node_modules/@opensearch-project/opensearch/lib/Transport.js:425:23) at IncomingMessage.onEnd (/var/task/node_modules/@opensearch-project/opensearch/lib/Transport.js:340:11) at IncomingMessage.emit (node:events:525:35) at IncomingMessage.emit (node:domain:489:12) at endReadableNT (node:internal/streams/readable:1358:12) at processTicksAndRejections (node:internal/process/task_queues:83:21) { meta: { body: '', statusCode: 404, headers: { 'x-request-id': 'xxx', 'x-amzn-aoss-test-account-id': 'xxx', 'x-amzn-aoss-test-collection-id': 'xxx', date: 'Mon, 07 Aug 2023 13:14:58 GMT', server: 'aoss-amazon', 'content-length': '0' }, meta: { context: null, request: [Object], name: 'opensearch-js', connection: [Object], attempts: 0, aborted: false } } } I analysed the function to see where exactly I got the error from and was able to pinpoint it more exactly. Langchains Opensearch Vector Store successfully creates the index using await this.client.indices.create({ index: this.indexName, body });, and successfully loads the documents into the vectorstore using await this.client.bulk({ body: operations }); The error happens at the end when the vector store is refreshed using await this.client.indices.refresh({ index: this.indexName }); I figured maybe the index is not correctly created, however when I leave it out entirely the same function returns an error message, that the index was not found, an error message that I did not get with the index. The 404 error leads me to believe that the url used for the refresh function is not found. Considering all other functions worked (bulk, create) I suspect that the url generated in the refresh function is somehow incorrect. It looks like this: https://projectid.eu-central-1.aoss.amazonaws.com/index/_refresh. As said before, the index exists. Any help solving this problem would be appreciated.

To reproduce

  1. Set up an Opensearch Vector Store using the following instructions: https://docs.aws.amazon.com/opensearch-service/latest/developerguide/serverless-getting-started.html. Set up correct permissions
  2. Set up a typescript node project with langchain v.0.0.102 and @opensearch-project/opensearch v.2.3.1 as a dependency
  3. Setting up the vectorstore using the following instructions: https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/opensearch

Expected behavior

Expect the vector store to be successfully set up without throwing a 404 Error

Screenshots

MicrosoftTeams-image (16)

This function is throwing the error even though the index is created and exists.

Host / Environment

"dependencies": { "@aws-sdk/client-opensearch":"^3.382.0", "axios": "^1.3.2", "body-parser": "^1.19.0", "csv-parse": "^5.4.0", "html-to-text": "^9.0.5", "langchain": "^0.0.102", "@opensearch-project/opensearch":"^2.3.1" }, "devDependencies": { "@types/express": "^4.17.1", "@types/html-to-text": "^9.0.1", "@types/jest": "^29.5.0", "axios-mock-adapter": "^1.21.2", "jest": "^29.5.0", "serverless-domain-manager": "^3.3.0", "serverless-plugin-typescript": "^2.1.2", "ts-jest": "^29.1.0", "typescript": "^4.8.4" }

Additional context

No response

Relevant log output

No response

naveentatikonda commented 1 year ago

@annassommer Serverless doesn't support the index refresh API. As you have mentioned that the index got created and docs are ingested, have you tried to run some search queries on that index successfully without running the refresh API ? If not, please try and let us know if you are still running into any other issues.

annassommer commented 1 year ago

Thank you for your reply. I tried that out, however querying does not seem to work, since Langchain provides the search body in a different format then AOSS accepts it. I suspect that Langchain only supports AWS OpenSearch Ingestion, not AWS OpenSearch Serverless, and that the refresh call and the search call body would work with Ingestion. I will try to contact them regarding support for OpenSearch Serverless.