stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy-docs.vercel.app/
MIT License
18.36k stars 1.41k forks source link

KeyError: 'content' error trying rag with weaviate and ollama #1152

Closed FATIHISILGAN closed 4 months ago

FATIHISILGAN commented 4 months ago

I want to rag the models I downloaded to my local device with Ollama using weaviate vector db and dspy. I created a sample code but no matter which model I try with I get the error KeyError: 'content'.

Here are the parts of my sample code:

connect to weaviate

import  weaviate
import weaviate.classes as wvc
from weaviate.collections.classes.grpc import Move
import dspy

client = weaviate.connect_to_local()

Create a simple collection of movies with title and description.


from weaviate.classes.config import Configure, Property, DataType

vectorizer = Configure.Vectorizer.multi2vec_clip(
    text_fields=['title', 'description']
)

client.collections.create(
    "Movie",
    vectorizer_config=vectorizer,  
    properties=[
        Property(name="title", data_type=DataType.TEXT, skip_vectorization=False),  
        Property(name="description", data_type=DataType.TEXT, skip_vectorization=False)
    ]
) 

Insert movies

Movie= client.collections.get("Movie")
movies = [
    {"title": "Inception", "description": "A mind-bending thriller"},
    {"title": "The Matrix", "description": "A hacker discovers a shocking truth"},
    {"title": "Interstellar", "description": "A journey beyond the stars"},
    {"title": "The Prestige", "description": "Two magicians compete for supremacy"}
]

for movie in movies:
    data={
        "title":movie["title"],
        "description":movie["description"]
    }    
    Movie.data.insert(data)    

The rag is here and I also get the error here.

lama_ollama = dspy.OllamaLocal(model="llama3", max_tokens=4000, timeout_s=480)

from dspy.retrieve.weaviate_rm import WeaviateRM

# WeaviateRM
retriever_model = WeaviateRM(
    weaviate_collection_name="Movie",
    weaviate_client=client,
    k=10
)

dspy.settings.configure(lm=lama_ollama, rm=retriever_model)

query = "I'm looking for a mind-bending thriller."

retriever = dspy.Retrieve(3)
recommendations = retriever(query)
print(recommendations)

Here is the full error


KeyError                                  Traceback (most recent call last)
Cell In[91], [line 19](vscode-notebook-cell:?execution_count=91&line=19)
     [15](vscode-notebook-cell:?execution_count=91&line=15) query = "I'm looking for a mind-bending thriller."
     [18](vscode-notebook-cell:?execution_count=91&line=18) retriever = dspy.Retrieve(3)
---> [19](vscode-notebook-cell:?execution_count=91&line=19) recommendations = retriever(query)
     [20](vscode-notebook-cell:?execution_count=91&line=20) print(recommendations)

File c:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\dspy\retrieve\retrieve.py:30, in Retrieve.__call__(self, *args, **kwargs)
     [29](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/retrieve.py:29) def __call__(self, *args, **kwargs):
---> [30](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/retrieve.py:30)     return self.forward(*args, **kwargs)

File c:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\dspy\retrieve\retrieve.py:39, in Retrieve.forward(self, query_or_queries, k, **kwargs)
     [36](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/retrieve.py:36) # print(queries)
     [37](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/retrieve.py:37) # TODO: Consider removing any quote-like markers that surround the query too.
     [38](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/retrieve.py:38) k = k if k is not None else self.k
---> [39](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/retrieve.py:39) passages = dsp.retrieveEnsemble(queries, k=k,**kwargs)
     [40](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/retrieve.py:40) return Prediction(passages=passages)

File c:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\dsp\primitives\search.py:57, in retrieveEnsemble(queries, k, by_prob, **kwargs)
     [54](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dsp/primitives/search.py:54) queries = [q for q in queries if q]
     [56](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dsp/primitives/search.py:56) if len(queries) == 1:
---> [57](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dsp/primitives/search.py:57)     return retrieve(queries[0], k, **kwargs)
     [59](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dsp/primitives/search.py:59) passages = {}
     [60](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dsp/primitives/search.py:60) for q in queries:
...
---> [94](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/weaviate_rm.py:94)     parsed_results = [result.properties[self._weaviate_collection_text_key] for result in results.objects]
     [95](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/weaviate_rm.py:95)     passages.extend(dotdict({"long_text": d}) for d in parsed_results)
     [97](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/weaviate_rm.py:97) # Return type not changed, needs to be a Prediction object. But other code will break if we change it.

KeyError: 'content'
KeyError                                  Traceback (most recent call last)
Cell In[91], [line 19](vscode-notebook-cell:?execution_count=91&line=19)
     [15](vscode-notebook-cell:?execution_count=91&line=15) query = "I'm looking for a mind-bending thriller."
     [18](vscode-notebook-cell:?execution_count=91&line=18) retriever = dspy.Retrieve(3)
---> [19](vscode-notebook-cell:?execution_count=91&line=19) recommendations = retriever(query)
     [20](vscode-notebook-cell:?execution_count=91&line=20) print(recommendations)

File c:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\dspy\retrieve\retrieve.py:30, in Retrieve.__call__(self, *args, **kwargs)
     [29](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/retrieve.py:29) def __call__(self, *args, **kwargs):
---> [30](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/retrieve.py:30)     return self.forward(*args, **kwargs)

File c:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\dspy\retrieve\retrieve.py:39, in Retrieve.forward(self, query_or_queries, k, **kwargs)
     [36](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/retrieve.py:36) # print(queries)
     [37](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/retrieve.py:37) # TODO: Consider removing any quote-like markers that surround the query too.
     [38](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/retrieve.py:38) k = k if k is not None else self.k
---> [39](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/retrieve.py:39) passages = dsp.retrieveEnsemble(queries, k=k,**kwargs)
     [40](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/retrieve.py:40) return Prediction(passages=passages)

File c:\Users\ASUS\AppData\Local\Programs\Python\Python39\lib\site-packages\dsp\primitives\search.py:57, in retrieveEnsemble(queries, k, by_prob, **kwargs)
     [54](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dsp/primitives/search.py:54) queries = [q for q in queries if q]
     [56](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dsp/primitives/search.py:56) if len(queries) == 1:
---> [57](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dsp/primitives/search.py:57)     return retrieve(queries[0], k, **kwargs)
     [59](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dsp/primitives/search.py:59) passages = {}
     [60](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dsp/primitives/search.py:60) for q in queries:
...
---> [94](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/weaviate_rm.py:94)     parsed_results = [result.properties[self._weaviate_collection_text_key] for result in results.objects]
     [95](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/weaviate_rm.py:95)     passages.extend(dotdict({"long_text": d}) for d in parsed_results)
     [97](file:///C:/Users/ASUS/AppData/Local/Programs/Python/Python39/lib/site-packages/dspy/retrieve/weaviate_rm.py:97) # Return type not changed, needs to be a Prediction object. But other code will break if we change it.

KeyError: 'content'

Bye the way I tried also weaviate client v3 but it doesn't work.

arnavsinghvi11 commented 4 months ago

Hi @FATIHISILGAN ,

It looks like the WeaviateRM requires you to specify the weaviate_collection_text_key as well or else it defaults to ‘content’.

Let me know if setting that helps.

FATIHISILGAN commented 4 months ago

Hi @FATIHISILGAN ,

It looks like the WeaviateRM requires you to specify the weaviate_collection_text_key as well or else it defaults to ‘content’.

Let me know if setting that helps.

Yes, it worked. I gave one of the property names of the collection in weaviate to the weaviate_colllection_text_key parameter and no more keyError:content error.

Thank you @arnavsinghvi11