stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy-docs.vercel.app/
MIT License
17.36k stars 1.33k forks source link

extend generation logic tests #1172

Closed mikeedjones closed 3 months ago

mikeedjones commented 3 months ago

Added tests to better define the behaviour of the extend generation logic in dsp/primitives/predict.py.

They currently don't pass with either version of the extend generation logic! I'm not sure what the intended behaviour should be - can @XenonMolecule, @okhat @arnavsinghvi11 please explain what these tests should look like?

Cheers!

okhat commented 3 months ago

Hey @mikeedjones, thanks so much for the deep dive! The changes do need to be reverted for now, because they break more fundamental things than the regressions you mentioned, although these are very important too.

In the longer run, I like the direction of this PR overall. Let's think of what the right long-term behavior is for parsing.

The original DSPy behavior, which strikes a very good compromise IMO but needs to be better documented, is: when you ask for n=1 completion, you'll always get it back. If you request n>1 completions, you get at least one. No guarantees. If you need guaranteed n > 1 behavior, create multiple modules with temperature=0.7 + i*0.001.

mikeedjones commented 3 months ago

Hi @okhat - I think what I show above is that the reverted logic isn't currently working as I expected?

The original logic fills the missed field with an empty string and the completion continues from there - so the model would be prompted to start from Sauce as opposed to Garnish and the user has to accept the unfilled field? Is that the indended behavior?

Given that #920 was merged - think it makes sense to revert and add some tests so someone else can't come and inadvertently break something further down the line!

mikeedjones commented 3 months ago

Updated the tests and added a comment to each including the logged lm calls. This branch has been pointing at https://github.com/stanfordnlp/dspy/tree/mipro_v2 and the tests are passing for the reverted logic. I think the dummy LM generations are representative - but I'm not sure the behaviour demonstrated by the tests is desired?