langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
12.3k stars 2.08k forks source link

Vector store not initialised yet but I have called `fromDocuments` #3840

Closed PierrickLozach closed 8 months ago

PierrickLozach commented 8 months ago

When I run this code:

    const { BedrockRuntimeClient, InvokeModelCommand } = require('@aws-sdk/client-bedrock-runtime');
    const { BedrockEmbeddings } = require('langchain/embeddings/bedrock');
    const { FaissStore } = require('langchain/vectorstores/faiss');

    const loader = new PDFLoader(`/tmp/${key}`, {
      splitPages: false,
    });

    const docs = await loader.load();

    // Create bedrock runtime client
    const client = new BedrockRuntimeClient({
      region,
    });

    // Load embeddings model
    const embeddings = new BedrockEmbeddings({
      model: 'amazon.titan-embed-text-v1',
      client,
      region,
    });

    const vectorStore = await FaissStore.fromDocuments(docs, embeddings);

    //FIXME: Fails with "Error: Vector store not initialised yet. Try calling `fromTexts`, `fromDocuments` or `fromIndex` first."
    await vectorStore.save('/tmp/index.faiss');

I get this error:

Error: Vector store not initialised yet. Try calling `fromTexts`, `fromDocuments` or `fromIndex` first.
    at get index [as index] (/Users/pierrick.lozach/Documents/GreenTeacket/backend/lambdas/securityqa/node_modules/langchain/node_modules/@langchain/community/dist/vectorstores/faiss.cjs:89:19)
    at FaissStore.save (/Users/pierrick.lozach/Documents/GreenTeacket/backend/lambdas/securityqa/node_modules/langchain/node_modules/@langchain/community/dist/vectorstores/faiss.cjs:164:18)
    at async module.exports.generateEmbeddings (/Users/pierrick.lozach/Documents/GreenTeacket/backend/lambdas/securityqa/ai/handler.js:85:5)
    at async MessagePort.<anonymous> (file:///Users/pierrick.lozach/Documents/GreenTeacket/backend/lambdas/securityqa/node_modules/serverless-offline/src/lambda/handler-runner/worker-thread-runner/workerThreadHelper.js:24:14)

Could you help? I have searched and tried several things but can't fix it.

dosubot[bot] commented 8 months ago

🤖

Hi @PierrickI3! I'm Dosu, a bot here to help you with your issues and questions about langchainjs, while you wait for a human maintainer. Looking forward to working together on this!

The error message you're encountering is indicating that the vector store is not properly initialized before it's being used. In your code, you are correctly calling FaissStore.fromDocuments(docs, embeddings); to initialize the vector store. However, the error might be due to the asynchronous nature of the fromDocuments method.

The fromDocuments method is an asynchronous method that returns a Promise. This means that it starts executing and then immediately returns control back to the calling code before it has finished executing. If the calling code tries to use the vector store before fromDocuments has finished executing, it will encounter this error.

To fix this, you need to ensure that the fromDocuments method has finished executing before you use the vector store. You can do this by using the await keyword before FaissStore.fromDocuments(docs, embeddings); to pause the execution of the code until the Promise is resolved.

Here is the corrected code:

    // Create bedrock runtime client
    const client = new BedrockRuntimeClient({
      region,
    });

    // Load embeddings model
    const embeddings = new BedrockEmbeddings({
      model: 'amazon.titan-embed-text-v1',
      client,
      region,
    });

    const vectorStore = await FaissStore.fromDocuments(docs, embeddings);

    // Now the vector store is properly initialized and can be used
    await vectorStore.save('/tmp/index.faiss');

In this corrected code, I've used the await keyword to ensure that the fromDocuments method has finished executing before we use the vector store. This should resolve the error you're encountering. If you have any other questions or need further clarification, feel free to ask.

For more information, you can refer to the langchainjs repository and specifically the examples/src/indexes/vector_stores/hnswlib_fromdocs.ts file which contains similar usage of the fromDocuments method.

-Dosu

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

PierrickLozach commented 8 months ago

The automated response is not correct. I am indeed using await already?

PierrickLozach commented 8 months ago

I think I may have found the issue. loader.load() returns an empty array however the PDF file is valid. Investigating...

PierrickLozach commented 8 months ago

So I have identified the issue with PDFLoader. It works for 1 PDF document but not for the other 2 that I have. It does not seem to be a size issue as the one that works is 10 MB while the other 2 are 300 KB and 52 MB.

await loader.load() does not return any errors, just a simple empty array.

Any help would be appreciated.

PierrickLozach commented 8 months ago

I found the issue. 2 of the PDFs were actually graphics as text and that's why it returned no text at all. Closing.

sidharthsid7 commented 2 months ago

// Import the required modules from langchain const { TextLoader } = require("langchain/document_loaders/fs/text"); const { CharacterTextSplitter } = require("langchain/text_splitter"); const {OpenAIEmbeddings} = require("langchain/embeddings/openai") const { FaissStore } = require("langchain/vectorstores/faiss");

// Wrap your code in an async function and export it async function processDocuments() { try {

const loader = new TextLoader("./info.txt");
const docs = await loader.load();

const splitter = new CharacterTextSplitter({
  chunkSize: 100,
  chunkOverlap: 50,
});

const documents = await splitter.splitDocuments(docs);

const embeddings = new OpenAIEmbeddings();
console.log(embeddings,'vector',documents);
const vectorstore = await FaissStore.fromDocuments(documents, embeddings);

await vectorstore.save("./");

} catch (error) { console.error("Error processing documents:", error); } }

module.exports = { processDocuments };

Error processing documents: Error: Request failed with status code 429 for vector db , do we need to use paid apikeys?