run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.79k stars 4.75k forks source link

[Question]: Updating metadata and text in existing pinecone index #13833

Open Prem-Nitin opened 1 month ago

Prem-Nitin commented 1 month ago

Question Validation

Question

I used hierarchical parser for chunking and using pinecone as my vector store and mongodb for docstore. Now i want to update metadata and text for some chunks. How to do that so it is reflected on both nodes in pinecone index as well as the parent nodes in mongodb docstore.

dosubot[bot] commented 1 month ago

To ensure that updates to the metadata and text for some chunks are reflected in both the nodes in the Pinecone index and the parent nodes in the MongoDB docstore, you need to perform the following steps:

  1. Update the nodes in the Pinecone index using the upsert method.
  2. Update the parent nodes in the MongoDB docstore with the corresponding metadata and text changes.

Here is an example of how you can achieve this:

def update_nodes_in_pinecone_and_mongodb(pinecone_vector_store, mongodb_docstore, nodes):
    # Update nodes in Pinecone index
    ids = []
    entries = []
    for node in nodes:
        node_id = node.node_id

        metadata = node_to_metadata_dict(
            node,
            remove_text=pinecone_vector_store.remove_text_from_metadata,
            flat_metadata=pinecone_vector_store.flat_metadata,
        )

        entry = {
            ID_KEY: node_id,
            VECTOR_KEY: node.get_embedding(),
            METADATA_KEY: metadata,
        }
        if pinecone_vector_store.add_sparse_vector and pinecone_vector_store._tokenizer is not None:
            sparse_vector = generate_sparse_vectors(
                [node.get_content(metadata_mode=MetadataMode.EMBED)],
                pinecone_vector_store._tokenizer,
            )[0]
            entry[SPARSE_VECTOR_KEY] = sparse_vector

        ids.append(node_id)
        entries.append(entry)

    pinecone_vector_store._pinecone_index.upsert(
        entries,
        namespace=pinecone_vector_store.namespace,
        batch_size=pinecone_vector_store.batch_size,
        **pinecone_vector_store.insert_kwargs,
    )

    # Update parent nodes in MongoDB docstore
    for node in nodes:
        node_id = node.node_id
        metadata = node_to_metadata_dict(
            node,
            remove_text=pinecone_vector_store.remove_text_from_metadata,
            flat_metadata=pinecone_vector_store.flat_metadata,
        )
        mongodb_docstore.update_document(
            node_id,
            {"metadata": metadata, "text": node.get_content()}
        )

    return ids

In this example, pinecone_vector_store is an instance of PineconeVectorStore, mongodb_docstore is an instance of MongoDocumentStore where the parent nodes are stored, and nodes is the list of nodes to be updated. The function update_nodes_in_pinecone_and_mongodb updates the nodes in both the Pinecone index and the MongoDB docstore [1][2][3][4].

To continue talking to Dosu, mention @dosu.