Open Gwenn-LR opened 3 months ago
+1, thanks!. Really surprising to see you can't return any kind of metadata out of your retriever to e.g. show sources to the user..
3 months without any answer from the team, I've just received a notification that my PR has been closed without any further explaination and I've checked, the issue does not seem to have been patched in the main branch.
Facing the same issue here. I just want to know the prob and scores of retrieved docs. The RM code is still not fixed in the latest release.
Description
While following the tutorial [02] Multi-Hop Question Answering, adapted to my system (that is to say a LM hosted and provided by a local Ollama server and a RM locally hosted via Chroma), I could not get interpretable scores since I had not the same database structure : I decided, since there were no indication in your tutorial, to add the title of each wiki page as a metadata of each corresponding chunk while (I think) you added it as part of the context. So metric based on the comparison between
gold_titles
examples andnormalized_text
from context could not achieve a satisfactory score as you dit in your case. That's why I've tried to add metadata to my Prediction and it's where issues appeared.Package version
python: 3.10.12 dspy-ai: 2.4.13
Issue
First, when I call the Retriever with set attribute
with_metadata
set toTrue
, it callsdsp.primitives.search.retrieveEnsemblewithMetadata
which calls itselfdsp.primitives.search.retrieveRerankEnsemblewithMetadata
when there is noreranker
attribute todsp.settings
, which itself raise aAssertionError: Both RM and Reranker are needed to retrieve & re-rank.
since there is noreranker
as tested just before.Once this issue solved, the
dsp.primitives.search.retrieveEnsemblewithMetadata
method callsdsp.primitives.search.retrieve
when there is only one query (which is my case) and it does not extract metadatas at all. I don't think any metadata are extracted with any methods fromdsp.primitives.search
.Finally, I've tried to fix at least the method for my case and defined my
passages
variable as a dictionnary as indicated in your code:https://github.com/stanfordnlp/dspy/blob/af5186cf07ab0b95d5a12690d5f7f90f202bc86e/dspy/retrieve/retrieve.py#L93C1-L94C63
However,
dspy.retrieve.retrieve.single_query_passage
seems to be written for multiple passages unlike what suggest its name and in my case it generate aPrediction
with a list in a list aspassages
attribute which leads to an error when I try to clean mycontext + passages
withdsp.utils.deduplicate
(since a list can't be hashed).Possible solution
I'll open a PR to solve those issues, I think the first one is just a typo, the second should respect your syntax but it might be tightly linked to the next issues I've faced so I would like to know if you could help me solve those. Thank you for your devoted attention to this matter.