stanfordnlp / dspy

DSPy: The framework for programming—not prompting—language models
https://dspy.ai
MIT License
19.47k stars 1.48k forks source link

Using DSPy with Qdrant and ColQwen2 #1882

Open kubni opened 18 hours ago

kubni commented 18 hours ago

Hello. I used ColQwen2 model from Colpali team to generate embeddings and store them into my Qdrant database. I wanted to try out DSPy, but I can't get it to retrieve the points from Qdrant:

llm = dspy.OpenAI(model="gpt-4o-mini")
retriever_model = QdrantRM(
    qdrant_collection_name=collection_name, qdrant_client=qdrant_client
)
dspy.settings.configure(lm=llm, rm=retriever_model)

retrieve = dspy.Retrieve()
question = "List all institutional stakeholders"
topK_passages = retrieve(question).passages

for passage in topK_passages:
    print(f"Passage: {passage}\n")

I get the following error:

qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 400 (Bad Request)
Raw response content:
b'{"status":{"error":"Wrong input: Vector dimension error: expected dim: 128, got 384"},"time":0.000274272}'

I have tried changing the Vectorizer used but to no avail. Colqwen2 produces multi-vector embeddings, I am not sure if that is supported by DSPy. Any ideas?

kubni commented 18 hours ago

When I am searching the database the usual way (without DSPy) I do something like this:

 with torch.no_grad():
        batch_query = processor.process_queries([query_text]).to(model.device)
        query_embedding = model(**batch_query)

  multivector_query = (
      query_embedding[0].cpu().float().tolist()
  ) 

  search_result = qdrant_client.query_points(
      collection_name=collection_name,
      query=multivector_query,
      limit=top_k,
      timeout=100,
      search_params=models.SearchParams(
          quantization=models.QuantizationSearchParams(
              ignore=False,
              rescore=True,
              oversampling=2.0,
          ),
          hnsw_ef=128,
          exact=True,
      ),
  )

Here, multivector_query that is used for searching is a vector of 27 vectors of size 128. Qdrant knows how to handle it since its configured to use MAX_SIM comparison.

I couldn't find a way to do something similar with DSPy so far.

kuntal-c commented 15 hours ago

I believe currently, DSPy assumes single-vector embeddings, which limits native multi-vector search. Future versions could potentially add multi-vector support by extending DSPy retrievers or submitting a feature request to the DSPyGen team.

In the meantime, you can integrate your custom logic using the options below

Option A: Modify DSPy to handle multi-vector embeddings. To make DSPy compatible with your setup, you can subclass the QdrantRM retriever model to handle multi-vector embeddings and pass the appropriate embedding structure into Qdrant for querying.

Option B: Preprocess Your Embeddings into Single Vectors If your use case does not strictly require multi-vector search, consider reducing your embeddings to a single vector.

  1. Pooling (e.g., mean, max): Aggregate the 27 vectors into one by averaging or taking the maximum.
  2. Dimensionality Reduction (e.g., PCA, UMAP): Reduce the 384-dimensional embeddings from ColQwen2 into 128 dimensions to match your Qdrant collection.

Option C: Use Your Custom Search Logic with DSPy If neither of the above options works for you, you can bypass DSPy’s retriever and directly use your own Qdrant querying logic alongside DSPy’s LLM functionalities.

kubni commented 15 hours ago

Thanks for the reply. I will try some of the ideas you proposed.

okhat commented 14 hours ago

Hey @kubni , does this all need to happen in DSPy? Just do it normally. Use DSPy for the LM modules, not for the multi-vector search.

kubni commented 14 hours ago

Okay, thanks for the heads up, @okhat . I am still new to DSPy, so I am trying to see how various things work and fit together. I will definitely try using the LM modules.

kubni commented 14 hours ago

Hey @okhat. My RAG pipeline involves retrieving a Qdrant point which contains a base64 encoded image from the database, and afterwards I pass it to the VLM like 4o-mini. Is this possible with DSPy LM modules? I see some closed PRs but it doesn't say that they are merged?

okhat commented 5 hours ago

@kubni Basically, you don't need DSPy features for the retrieval parts, since you want to use Qdrant or multi-vector serving or whatnot. Rely on DSPy only for the LM steps.