stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy.ai
MIT License
18.79k stars 1.44k forks source link

passages_per_hop changes results even context [1] provide the correct answer #1510

Open wzds2015 opened 1 month ago

wzds2015 commented 1 month ago

I am following the tutorials to learn dspy. Now I am testing SimplifiedBaleen. I found when I choose passages_per_hop = 2 or 10, the results are different. I further checked the retrieved context. The correct answer is always in context [1] no matter how many context retrieved. Anyone knows about the reason behind? Appreciate if you can provide a code pointer.

A side question, I am using llama3.1:70b for the testing. I found for SimplifiedBaleen it's much worse than GPT3.5 and Mixtral. Does this make sense? Metrics below, GPT3.5 (showed in tutorial) = 60% Mixtral = 48% Llama3.1:70b = 20%

arnavsinghvi11 commented 1 month ago

Hi @wzds2015 ,

The correct answer is always in context [1] no matter how many context retrieved. Anyone knows about the reason behind?

The SimplifiedBaleen pipeline iteratively cumulative set of passages in the context and only adds more retreiveed passages in addition to the first x. Since the context starts with the first few retrieved passages, if the correct answer is already in context[1], adding more passages doesn’t change the result, and why the correct answer is consistently found regardless of how many passages are retrieved!

Regarding the Llama3.1 testing, the current intro notebook has not been adapted for the latest chat models (which is an upcoming change in DSPy v2.5)

wzds2015 commented 1 month ago

Hi @wzds2015 ,

The correct answer is always in context [1] no matter how many context retrieved. Anyone knows about the reason behind?

The SimplifiedBaleen pipeline iteratively cumulative set of passages in the context and only adds more retreiveed passages in addition to the first x. Since the context starts with the first few retrieved passages, if the correct answer is already in context[1], adding more passages doesn’t change the result, and why the correct answer is consistently found regardless of how many passages are retrieved!

Regarding the Llama3.1 testing, the current intro notebook has not been adapted for the latest chat models (which is an upcoming change in DSPy v2.5)

For my case, if choose passages_per_hop = 2 or 10, the results are different. From you answer, it seems shouldn't happen. Do you have any idea? passages_per_hop = 2: Answer: Five storeys passages_per_hop = 10: Answer: The passage does not provide information about the number of stories in the castle that David Gregory inherited.

okhat commented 1 month ago

Models are known to struggle with lots of context. In any case, Llama3.1:70b should definitely not get 20%. It should be much higher.

okhat commented 1 month ago

Do try this with dspy.LM as in the migration guide though and let us know!

Migration guide: https://github.com/stanfordnlp/dspy/blob/main/examples/migration.ipynb