stanfordnlp / dspy

DSPy: The framework for programming—not prompting—language models
https://dspy.ai
MIT License
19.25k stars 1.47k forks source link

MIPROv2 looks for the single best demonstration instead of the best combination of demonstrations #1789

Closed lamflokas closed 1 week ago

lamflokas commented 1 week ago

I was looking at the code of MIPROv2 looking to understand how the Bayesian Optimization problem is formulated.

I was surprised to see that the code is modeling the selection of demo examples as a single categorical variable per predictor. This would translate to trying to find the best single demo example for each predictor. In contrast, in the paper aims to find

combinations of demonstrations (within and across modules)

To model that, I was expecting an array of boolean variables per predictor (one for each demo example). Each variable would map to the corresponding example being selected or not (assuming that order of demonstrations in the prompt does not matter).

The same observations about demo example selection seem to be true about MIPRO as well.

Is MIPRO and MIPROV2 focusing on single example selection intentionally done to reduce the search space? Or is this a bug?

okhat commented 1 week ago

Hey @lamflokas ! For each module, MIPRO has a single categorical variable for demonstration(s). The value is a list of demonstrations, which may have one or more items. This is consistent with the paper and is indeed to cut down the search space.

lamflokas commented 1 week ago

Thank you for the quick response!