langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
94.11k stars 15.2k forks source link

Managing multiple vector stores separately #17580

Closed AndreaRiboni closed 4 months ago

AndreaRiboni commented 8 months ago

Checked other resources

Example Code

I created three different sets of examples and, for each of them, the related example selector

self.example_selector = SemanticSimilarityExampleSelector.from_examples(
    examples, # one of the three subset
    HuggingFaceEmbeddings(),
    Chroma,
    k=5,
)

Error Message and Stack Trace (if applicable)

No response

Description

I am trying to manage three different SemanticSimilarityExampleSelectors. Ideally, each one of them has different examples to choose from. I do NOT want to mix them. However, using the code i provided, Chroma mixes them and few-shot breaks

System Info

langchain 0.1.6

dosubot[bot] commented 8 months ago

🤖

Hey there @AndreaRiboni! Great to see you diving into another aspect of LangChain. Let's get to the bottom of this together.

Based on the information you've provided, it seems like you're trying to create separate instances of SemanticSimilarityExampleSelector for different sets of examples. However, you're encountering an issue where the examples are getting mixed up.

The SemanticSimilarityExampleSelector class in LangChain does not maintain separate vector stores for each instance. Instead, it uses a single VectorStore object that contains information about all examples. This is why you're seeing the examples getting mixed up.

Here's a snippet from the SemanticSimilarityExampleSelector class that shows this:

class SemanticSimilarityExampleSelector(BaseExampleSelector, BaseModel):
    """Example selector that selects examples based on SemanticSimilarity."""

    vectorstore: VectorStore
    """VectorStore than contains information about examples."""

When a new example is added, it is added to this VectorStore:

def add_example(self, example: Dict[str, str]) -> str:
    """Add new example to vectorstore."""
    ...
    ids = self.vectorstore.add_texts([string_example], metadatas=[example])
    return ids[0]

And when examples are selected, they are selected from this VectorStore:

def select_examples(self, input_variables: Dict[str, str]) -> List[dict]:
    """Select which examples to use based on semantic similarity."""
    ...
    example_docs = self.vectorstore.similarity_search(query, k=self.k)
    ...

So, the SemanticSimilarityExampleSelector class uses a single VectorStore for all examples, regardless of how many instances of the class are created.

To manage three different SemanticSimilarityExampleSelector instances without mixing them, you can create three separate instances of the SemanticSimilarityExampleSelector class, each with its own unique set of parameters and vectorstore. Here's an example of how you can do this:

from langchain_core.embeddings import Embeddings
from langchain_core.vectorstores import VectorStore
from libs.core.langchain_core.example_selectors.semantic_similarity import SemanticSimilarityExampleSelector

# Initialize your embeddings and vectorstore class
embeddings = Embeddings()
vectorstore_cls = VectorStore

# Define your examples
examples = [...]  # replace with your list of examples

# Create three different SemanticSimilarityExampleSelector instances
selector1 = SemanticSimilarityExampleSelector.from_examples(
    examples, embeddings, vectorstore_cls, k=4, input_keys=None
)

selector2 = SemanticSimilarityExampleSelector.from_examples(
    examples, embeddings, vectorstore_cls, k=4, input_keys=None
)

selector3 = SemanticSimilarityExampleSelector.from_examples(
    examples, embeddings, vectorstore_cls, k=4, input_keys=None
)

In this example, selector1, selector2, and selector3 are three separate instances of SemanticSimilarityExampleSelector. They will not mix with each other as they each have their own separate vectorstore and parameters.

You can adjust the parameters (like k, input_keys, etc.) for each selector as per your requirements.

Remember to replace embeddings, vectorstore_cls, and examples with your actual embeddings, vectorstore class, and examples.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
drewmccal commented 7 months ago

The dosubot is partially correct, but you need to add separate collection_name kwargs to the from_examples() method call:

selector1 = SemanticSimilarityExampleSelector.from_examples(
    examples, embeddings, vectorstore_cls, k=4, input_keys=None, collection_name="collection1"
)

selector2 = SemanticSimilarityExampleSelector.from_examples(
    examples, embeddings, vectorstore_cls, k=4, input_keys=None, collection_name="collection2"
)

selector3 = SemanticSimilarityExampleSelector.from_examples(
    examples, embeddings, vectorstore_cls, k=4, input_keys=None, collection_name="collection3"
)

Hope that helps.