langchain-ai / langchainjs

🦜🔗 Build context-aware reasoning applications 🦜🔗
https://js.langchain.com/docs/
MIT License
11.93k stars 2k forks source link

Pinecone 'namespace' option not taken into consideration when using 'PineconeStore.fromDocuments' #4720

Closed valdo99 closed 4 months ago

valdo99 commented 4 months ago

Checked other resources

Example Code

const { PineconeStore } = require('langchain/vectorstores/pinecone');
const pineconeIndex = pinecone.Index(process.env.PINECONE_INDEX);

const _namespace = "test"

await PineconeStore.fromDocuments(
     documents,
     new OpenAIEmbeddings({
         openAIApiKey: process.env.OPENAI_API_KEY,
     }),
     {
         pineconeIndex,
         namespace: _namespace
         maxConcurrency: 5, // Maximum number of batch requests to allow at once. Each batch is 1000 vectors.
     },
);

Error Message and Stack Trace (if applicable)

No response

Description

I'm trying to upsert documents into a specific pinecone namespace using PineconeStore.fromDocuments, it upserts the docs into the (default) namespace of pinecone.

fromDocuments has this definition:

static fromDocuments(docs: Document[], embeddings: Embeddings, dbConfig: PineconeLibArgs): Promise<PineconeStore>;

where the interface PineconeLibArgs accepts namespace in the following:

export interface PineconeLibArgs extends AsyncCallerParams {
    pineconeIndex: PineconeIndex;
    textKey?: string;
    namespace?: string;
    filter?: PineconeMetadata;
}

BUT namespace its not being used when executing the HTTP request to pinecone.

I found a workaround: if i create a new instance of PineconeStore with the namespace, then I can use addDocuments to add the documents in the right namespace

const pineconeIndex = pinecone.Index(process.env.PINECONE_INDEX);

const store = new PineconeStore(
  new OpenAIEmbeddings({
    openAIApiKey: process.env.OPENAI_API_KEY,
  }),
  {
    namespace: _namespace,
    maxConcurrency: 5,
    pineconeIndex,
  }
);

await store.addDocuments(documents);

I'll try to dive deeper on this and try to fix the issue!

System Info

"yarn info langchain" output too long :\
node v21.5.0, macos sonoma, M1 2020. yarn 1.22.19

dosubot[bot] commented 4 months ago

Hey @valdo99! Great to see you diving into the trenches with LangChain JS again. Running into a bit of a snag, I see? Let's get to the bottom of it together!

Thank you for the detailed report. You're correct in your observation that the 'namespace' option is not directly handled by the PineconeStore.fromDocuments method. Instead, it's passed to the constructor of PineconeStore when creating a new instance. This means the 'namespace' specified in dbConfig is used to initialize the PineconeStore instance, affecting all operations performed by this instance, including adding documents.

The PineconeStore.addDocuments method, on the other hand, does not directly handle the 'namespace' option itself. Instead, it delegates to the addVectors method, which uses the 'namespace' that was set during the instantiation of the PineconeStore object. This means the 'namespace' used in addDocuments is the one that was specified when the PineconeStore instance was created, and it cannot be changed or specified per call to addDocuments.

Here's the relevant code for reference:

// PineconeStore.fromDocuments
static async fromDocuments(
  docs: Document[],
  embeddings: EmbeddingsInterface,
  dbConfig: PineconeLibArgs
): Promise<PineconeStore> {
  const args = dbConfig;
  args.textKey = dbConfig.textKey ?? "text";

  const instance = new this(embeddings, args);
  await instance.addDocuments(docs);
  return instance;
}

// PineconeStore.addDocuments
async addDocuments(
  documents: Document[],
  options?: { ids?: string[] } | string[]
) {
  const texts = documents.map(({ pageContent }) => pageContent);
  return this.addVectors(
    await this.embeddings.embedDocuments(texts),
    documents,
    options
  );
}

Your workaround of creating a new instance of PineconeStore with the namespace, then using addDocuments to add the documents in the right namespace is indeed the correct approach given the current design of the PineconeStore class.

As for the issue of documents being upserted into the default namespace instead of the specified one, it suggests that either the 'namespace' was not correctly set during the instance creation or there's an issue in how the Pinecone API handles the namespace during the upsert operation. However, based on the provided code, the 'namespace' option is correctly passed and should be considered during the upsert operation.

I hope this clarifies the situation. If you have any further questions or need additional clarification, please don't hesitate to ask. Dosu

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.