langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
94.8k stars 15.35k forks source link

[Q] How to re-use QDrant collection data that are created separatly with non-default vector name? #2594

Closed mahmoudajawad closed 1 year ago

mahmoudajawad commented 1 year ago

I'm trying to use langchain to replace current use QDrant directly, in order to benefit from other tools in langchain, however I'm stuck.

I already have this code that creates QDrant collections on-demand:

    client.delete_collection(collection_name="articles")

    client.recreate_collection(
        collection_name="articles",
        vectors_config={
            "content": rest.VectorParams(
                distance=rest.Distance.COSINE,
                size=1536,
            ),
        },
    )

    client.upsert(
        collection_name="articles",
        points=[
            rest.PointStruct(
                id=i,
                vector={
                    "content": articles_embeddings[article],
                },
                payload={
                    "name": article,
                    "content": articles_content[article],
                },
            )
            for i, article in enumerate(ARTICLES)
        ],
    )                                                                             

Now, if a I try to re-use client as explained in https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/qdrant.html#reusing-the-same-collection I hit the following error:

Wrong input: Default vector params are not specified in config

I seem to be able to overcome this by modifying the code for QDrant class in langchain, however, I'm asking if there's any argument that I overlooked to apply using langchain with this QDrant client config, or else I would like to contribute a working solution that involves adding new parameter.

kacperlukawski commented 1 year ago

@mahmoudajawad Hi. Can you provide a code sample that ends with that error?

mahmoudajawad commented 1 year ago

@kacperlukawski here's a full example:

import json
import os

import qdrant_client
from dotenv import load_dotenv
from langchain import OpenAI
from langchain.chains import RetrievalQA
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Qdrant
from qdrant_client.http import models as rest

load_dotenv()

embeddings = OpenAIEmbeddings()

client = qdrant_client.QdrantClient(
    host=os.getenv("QDRANT_HOST_STRING"),
    prefer_grpc=True,
)

# TODO: Add at least one article, alongside its tags, and embeddings
articles_content: dict[str, str] = {}
articles_tags: dict[str, list[str]] = {}
articles_embeddings: dict[str, list[int]] = {}

# Or load from file
# with open("article.txt", encoding="UTF-8") as f:
#     articles_content["article"] = f.read()
#     articles_tags["article"] = []

# with open("../article.json", encoding="UTF-8") as f:
#     articles_embeddings["article"] = json.load(f)

client.delete_collection(collection_name="articles")

client.recreate_collection(
    collection_name="articles",
    vectors_config={
        "content": rest.VectorParams(
            distance=rest.Distance.COSINE,
            size=1536,
        ),
    },
)

client.upsert(
    collection_name="articles",
    points=[
        rest.PointStruct(
            id=i,
            vector={
                "content": articles_embeddings[article],
            },
            payload={
                "name": article,
                "content": articles_content[article],
                "metadata": {
                    "tags": articles_tags[article],
                },
            },
        )
        for i, article in enumerate(articles_content)
    ],
)

qdrant = Qdrant(
    client=client,
    collection_name="articles",
    embedding_function=embeddings.embed_query,
    metadata_payload_key="tags",
    content_payload_key="content",
)

retriever = qdrant.as_retriever()

qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=retriever)

QUERY = "WRITE QUERY RELATED TO AN ARTICLE YOU ADDED"

answer = qa.run(QUERY)
print(answer)

I have resolved this with #2751, which adds new argument to QDrant class, vector_key, that is then handled in method similarity_search_with_score.

dosubot[bot] commented 1 year ago

Hi, @mahmoudajawad! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue was about reusing QDrant collection data with a non-default vector name using langchain. You encountered an error when trying to reuse the client object and were seeking assistance or looking to contribute a solution that involves adding a new parameter.

I noticed that a user named "kacperlukawski" requested a code sample to help diagnose the error, and you provided a full example. It seems that you have since resolved the issue with a new argument added to the QDrant class.

Before we close this issue, could you please confirm if it is still relevant to the latest version of the LangChain repository? If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your contribution to LangChain!

mahmoudajawad commented 1 year ago

Closing it as #6871 solves the problem partially.

dosubot[bot] commented 1 year ago

Thank you @mahmoudajawad for closing the issue on LangChain! We appreciate your contribution to the repository.

Thiru-GVT commented 11 months ago

@mahmoudajawad Can I ask how did you manage to solve this? I'm hitting this error now and I cant seem to be able to sovle it..

Racasekumar commented 7 months ago

if i have stored embeddings using this code "vector_store = Qdrant.from_documents( texts, embeddings, path="/content/drive/MyDrive/embeddings", collection_name="indianReviews", ) in local persisit directory . How can fetch the embeddings from that existing memory directory?

avsolatorio commented 5 months ago

if i have stored embeddings using this code "vector_store = Qdrant.from_documents( texts, embeddings, path="/content/drive/MyDrive/embeddings", collection_name="indianReviews", ) in local persisit directory . How can fetch the embeddings from that existing memory directory?

Hi, @Racasekumar! You can try:

from langchain_community.vectorstores.qdrant import Qdrant

vector_store = Qdrant.from_existing_collection(
    path="/content/drive/MyDrive/embeddings", collection_name="indianReviews",
    embedding= embeddings,
)
Racasekumar commented 5 months ago

thank you so much,this is quite help full.i have a small doubt, i am working on an ecommerce rag application, how can i design the prompt in way for different tasks like a)greeting , b) product details and specification, c) product compare , d) handling irrelevant query? kindly help with some optimized prompt as well as DSpy if possible.

On Tue, Jun 4, 2024 at 10:55 PM Aivin V. Solatorio @.***> wrote:

if i have stored embeddings using this code "vector_store = Qdrant.from_documents( texts, embeddings, path="/content/drive/MyDrive/embeddings", collection_name="indianReviews", ) in local persisit directory . How can fetch the embeddings from that existing memory directory?

Hi, @Racasekumar https://github.com/Racasekumar! You can try:

from langchain_community.vectorstores.qdrant import Qdrant vector_store = Qdrant.from_existing_collection( path="/content/drive/MyDrive/embeddings", collection_name="indianReviews", embedding= embeddings, )

— Reply to this email directly, view it on GitHub https://github.com/langchain-ai/langchain/issues/2594#issuecomment-2148050074, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXAYGSUC5MGWNTPEMHBC5KTZFX2BTAVCNFSM6AAAAAAWXTNQN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBYGA2TAMBXGQ . You are receiving this because you were mentioned.Message ID: @.***>