langchain-ai / langchain

πŸ¦œπŸ”— Build context-aware reasoning applications
https://python.langchain.com
MIT License
93.08k stars 14.97k forks source link

Issue: I want to attach metadata in my PGVector vector store used as retriever for my ConversationChain memory #9067

Closed RafaelCantoVazquez closed 8 months ago

RafaelCantoVazquez commented 1 year ago

Issue you'd like to raise.

I am creating a chatbot that records the history of the conversation through vectors in my vector store per user that uses the chatbot, and for this, I identify who is the owner of the vectors through metadata. How can I add this metadata to the vectors if I'm using the vector store as a retriever for the memory in my ConversationChain?

This is my code:

import os
from langchain.llms import OpenAI
from langchain.chains import ConversationChain
from langchain.memory import VectorStoreRetrieverMemory
from langchain.vectorstores.pgvector import PGVector
from langchain.embeddings.openai import OpenAIEmbeddings

exit_conditions = ("q", "quit", "exit")
metadata={"id": "John Doe", "key": 123}
llm = OpenAI(openai_api_key=os.getenv("OPENAI_API_KEY"), temperature=0)
store = PGVector(
            collection_name="chatbot_embeddings",
            connection_string=os.getenv("POSTGRES_CONNECTION_STRING"),
            embedding_function=OpenAIEmbeddings(),
            collection_metadata=metadata
        )

while True:
    query = input("> ")
    if query in exit_conditions:
        break

    conversation_with_summary = ConversationChain(
        llm=llm,
        memory=VectorStoreRetrieverMemory(retriever=store.as_retriever()),
        verbose=True,
        metadata=metadata,
    )
    print(conversation_with_summary.predict(input=query))

Suggestion:

No response

dosubot[bot] commented 1 year ago

Answer generated by a πŸ€–

Answer

Hello,

To add metadata to the vectors in your PGVector vector store when it's being used as a retriever for the memory in your ConversationChain in LangChain, you can use the add_embeddings or add_texts methods of the PGVector class. Both methods accept an optional metadatas parameter which is a list of dictionaries where each dictionary represents the metadata for a corresponding vector or text.

Here is an example of how you can use these methods:

from langchain.vectorstores import PGVector
from langchain.embeddings.openai import OpenAIEmbeddings

CONNECTION_STRING = "postgresql+psycopg2://hwc@localhost:5432/test3"
COLLECTION_NAME = "state_of_the_union_test"
embeddings = OpenAIEmbeddings()
vectorestore = PGVector.from_documents(
    embedding=embeddings,
    documents=docs,
    collection_name=COLLECTION_NAME,
    connection_string=CONNECTION_STRING,
)

# Adding embeddings with metadata
texts = ["text1", "text2", "text3"]
embeddings = [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6], [0.7, 0.8, 0.9]]
metadatas = [{"author": "author1"}, {"author": "author2"}, {"author": "author3"}]
vectorestore.add_embeddings(texts=texts, embeddings=embeddings, metadatas=metadatas)

# Adding texts with metadata
texts = ["text4", "text5", "text6"]
metadatas = [{"author": "author4"}, {"author": "author5"}, {"author": "author6"}]
vectorestore.add_texts(texts=texts, metadatas=metadatas)

In this example, the metadatas parameter is used to add metadata to the vectors. Each dictionary in the metadatas list corresponds to a vector or text in the embeddings or texts list. The keys in the dictionary are the metadata fields and the values are the metadata values.

You can find more details about these methods in the PGVector class in the LangChain repository.

Please note that the metadata is not automatically associated with the vectors when you use the vector store as a retriever for the memory in your ConversationChain. You need to manually add the metadata to the vectors using the add_embeddings or add_texts methods as shown in the example above.

If you have any further questions or need more clarification, feel free to ask. If you think this is a feature that could be improved, we encourage you to contribute to the project and make a pull request.

Best regards, Dosu

Sources


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant.

RafaelCantoVazquez commented 1 year ago

That doesn't work for me, since being a vector store that is used as a retriever, the vectors from the chat memory are saved more automatically, and not so manually through those methods of the vector store.

dosubot[bot] commented 1 year ago

Answer generated by a πŸ€–

Answer

Hello,

Thank you for using LangChain and for your detailed question. It seems like you want to automatically attach metadata to the vectors in your PGVector vector store when it's being used as a retriever for the memory in your ConversationChain.

The LangChain framework does support the addition of custom methods to the PGVector class. You can create a custom method to add vectors with metadata to your vector store. This method would be similar to add_embeddings but with your custom logic for attaching metadata.

Here's an example of how you could implement this:

def add_custom_embeddings(
    self,
    texts: Iterable[str],
    embeddings: List[List[float]],
    metadatas: Optional[List[dict]] = None,
    ids: Optional[List[str]] = None,
    **kwargs: Any,
) -> List[str]:
    """Add embeddings to the vectorstore with custom metadata.

    Args:
        texts: Iterable of strings to add to the vectorstore.
        embeddings: List of list of embedding vectors.
        metadatas: List of metadatas associated with the texts.
        kwargs: vectorstore specific parameters
    """
    if ids is None:
        ids = [str(uuid.uuid1()) for _ in texts]

    if not metadatas:
        metadatas = [{} for _ in texts]

    with Session(self._conn) as session:
        collection = self.get_collection(session)
        if not collection:
            raise ValueError("Collection not found")
        for text, metadata, embedding, id in zip(texts, metadatas, embeddings, ids):
            # Add your custom logic for attaching metadata here
            custom_metadata = self.create_custom_metadata(metadata)
            embedding_store = self.EmbeddingStore(
                embedding=embedding,
                document=text,
                cmetadata=custom_metadata,
                custom_id=id,
                collection_id=collection.uuid,
            )
            session.add(embedding_store)
        session.commit()

    return ids

def create_custom_metadata(self, metadata: dict) -> dict:
    """Create custom metadata.

    Args:
        metadata: The original metadata.

    Returns:
        The custom metadata.
    """
    # Implement your logic for creating custom metadata here
    custom_metadata = {...}
    return custom_metadata

In this example, add_custom_embeddings is a new method that you would add to the PGVector class. It's similar to add_embeddings, but it uses a new helper method create_custom_metadata to create the custom metadata. You would need to implement the create_custom_metadata method according to your needs.

Please note that any custom methods you add should be compatible with the existing structure and functionality of the PGVector class.

I hope this helps! If you have any other questions, feel free to ask. If you find this solution works for you, we encourage you to contribute and make a PR to help others who might encounter the same issue.

Best, Dosu

Sources


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant.

gwc4github commented 1 year ago

FYI- The best I can tell is that this code does not work at all and I not really a useful starting point. Look at the code inside Langchain to understand how to use this.

dosubot[bot] commented 9 months ago

Hi, @RafaelCantoVazquez

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you are seeking guidance on adding metadata to vectors in a vector store used as a retriever for a ConversationChain memory in a chatbot. Dosubot provided a detailed response, suggesting the use of the add_embeddings or add_texts methods of the PGVector class to manually add metadata to the vectors. However, it seems that the suggested solution does not work for your specific use case, as the vectors from the chat memory are saved more automatically. Dosubot then suggested creating a custom method to add vectors with metadata to the vector store, providing an example implementation. Another user, gwc4github, expressed skepticism about the provided code and recommended looking at the code inside Langchain for a better understanding.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you!

scs2learn commented 2 months ago

@RafaelCantoVazquez I am working on creating a chatbot using VectoreStoreRetrieveMemory, ConversationChain and CustomLLM. Currently its CLI version similar to what you have mentioned. I am using Chromadb as my vector store. In my case the conversation chain exits as soon as the llm returns the response (verified using debug log). ConversationChain does not return the response and the control does not go back to the question prompt within the while loop. Have you encountered such behaviour? TIA

gwc4github commented 2 months ago

@scs2learn as best I can tell this doesn't work at all so I stopped using it and moved on to something else. (There was no support either so there was no alternative- the bot seemed to always be wrong.)