Extra generations cause max_tokens of AWSModels to halve permanently each time

jisikoff commented 2 months ago

Extra generations in the Predict Module here:

https://github.com/stanfordnlp/dspy/blob/main/dsp/primitives/predict.py#L96

halve the max_tokens value of the model in the kwargs and try again this is I think supposed to be a temporary halving since its reading that from a global settings on the dsp.settings.lm.kwargs["max_tokens"] which is on the lm model object and passing the halved value in as kwargs for that generation only.

However this halving in made permanent going forward by code in the AWSmodels code which takes that lm.kwargs dictionary as a reference off "self" and sets the max_tokens back into it thus making all future generations start from the halved value and eventually causing all generations to end up at a limit of max_tokens=75 after a certain number of runs.

The code that sets the max_tokens back on the lm model at the halved value I believe is here:

https://github.com/stanfordnlp/dspy/blob/main/dsp/modules/aws_models.py#L221-L223

arnavsinghvi11 commented 1 month ago

Hi @jisikoff , this is generalizable to all LM calls in DSPy, not just AWSModels. This PR has some relevant discussion on the current behavior.

jisikoff commented 1 month ago

So the desired behavior is for the global state of the llm to be corrupted if there are retries? The only solution I've found is to call

dsp.settings.lm.kwargs["max_tokens"] =8192

before every module call in case there was a retry in the previous invocation? Otherwise the llm eventually settles at 75 permanently again.

okhat commented 1 month ago

Thanks for opening this! We released DSPy 2.5 yesterday. I think the new dspy.LM and the underlying dspy.ChatAdapter will probably resolve this problem.

Here's the (very short) migration guide, it should typically take you 2-3 minutes to change the LM definition and you should be good to go: https://github.com/stanfordnlp/dspy/blob/main/examples/migration.ipynb

Please let us know if this resolves your issue. I will close for now but please feel free to re-open if the problem persists.

stanfordnlp / dspy

Extra generations cause max_tokens of AWSModels to halve permanently each time #1465