Closed elyxlz closed 10 months ago
This is absolutely on the short-term map. We just need to identify exactly how to use these scores for building demonstrations though.
right now, you can implement a threshold function (if score > threshold, return true).
Did you have a particular pattern in mind
Oh btw. Continuous metrics are already fully supported for optimization but not for candidate selection.
This may already be all you need.
If trace
is None, you can return a continuous score and that will be used for random search or Optuna optimization.
I did not have a particular pattern in mind, I've only started to dig deeper into this library a couple of hours ago. Very happy that I can play with continuous losses however! Going to give that a try. I'm going to have a retrieval-augmented agent evaluate results according to a criteria and give continuous feedback.
Oh btw. Continuous metrics are already fully supported for optimization but not for candidate selection.
This may already be all you need.
If
trace
is None, you can return a continuous score and that will be used for random search or Optuna optimization.
Hello, @okhat
Would you please elaborate more on this? What is trace
here?
thanks!
Doesn't seem like these scores are currently used for Optuna optimization: https://github.com/stanfordnlp/dspy/blob/a102f3aba3aa02ed7db5d468ef1c3d463c97710b/dspy/teleprompt/teleprompt_optuna.py#L43
I would also love to see entire devset metrics be defineable for optimization, so that for example I could optimize for precision! Currently seems like it's fixed to optimize for accuracy.
Answered in depth now at: #298
One note: trace
is the variable of the full trace of the DSPy program, which is given to your metric during bootstrapping new examples so you can "look back" at all inputs and outputs of all steps of the program.
Is it possible for validation metrics to return non-boolean answers?
This could bring a lot more control over the optimization strategy, e.g. loss weighting for different metrics.
Another idea is natural language "losses", kind of like RLAIF. AKA self-critique.