stanfordnlp / dspy

DSPy: The framework for programming—not prompting—language models
https://dspy.ai
MIT License
19.36k stars 1.47k forks source link

Support for continuously valued validation metric "losses" #145

Closed elyxlz closed 10 months ago

elyxlz commented 1 year ago

Is it possible for validation metrics to return non-boolean answers?

This could bring a lot more control over the optimization strategy, e.g. loss weighting for different metrics.

Another idea is natural language "losses", kind of like RLAIF. AKA self-critique.

okhat commented 1 year ago

This is absolutely on the short-term map. We just need to identify exactly how to use these scores for building demonstrations though.

right now, you can implement a threshold function (if score > threshold, return true).

Did you have a particular pattern in mind

okhat commented 1 year ago

Oh btw. Continuous metrics are already fully supported for optimization but not for candidate selection.

This may already be all you need.

If trace is None, you can return a continuous score and that will be used for random search or Optuna optimization.

elyxlz commented 1 year ago

I did not have a particular pattern in mind, I've only started to dig deeper into this library a couple of hours ago. Very happy that I can play with continuous losses however! Going to give that a try. I'm going to have a retrieval-augmented agent evaluate results according to a criteria and give continuous feedback.

kimianoorbakhsh commented 1 year ago

Oh btw. Continuous metrics are already fully supported for optimization but not for candidate selection.

This may already be all you need.

If trace is None, you can return a continuous score and that will be used for random search or Optuna optimization.

Hello, @okhat Would you please elaborate more on this? What is trace here? thanks!

bendavidsteel commented 12 months ago

Doesn't seem like these scores are currently used for Optuna optimization: https://github.com/stanfordnlp/dspy/blob/a102f3aba3aa02ed7db5d468ef1c3d463c97710b/dspy/teleprompt/teleprompt_optuna.py#L43

I would also love to see entire devset metrics be defineable for optimization, so that for example I could optimize for precision! Currently seems like it's fixed to optimize for accuracy.

okhat commented 9 months ago

Answered in depth now at: #298

One note: trace is the variable of the full trace of the DSPy program, which is given to your metric during bootstrapping new examples so you can "look back" at all inputs and outputs of all steps of the program.