Open thomasahle opened 9 months ago
@thomasahle You are right, you have made 2 great points about Evaluate.
First, we need the key parallelism logic to be factored out so people can do parallel steps. (btw this will be not too hard to make work inside modules, I know the parts that need care, it's basically dspy.settings, especially dspy.settings.trace at bootstrap time)
Second, we need to support multi-metric evaluate, which is a smaller change.
Can I help you do a PR? :sweat_smile:
I'm happy to send some PRs. Right now I'm just a bull in a china shop hitting random obstacles, and creating issue reports to keep track of them. I don't think this one is super important to fix right now, but I just wanted to register it. If I'm creating too much spam on the issue tracker, I'm also happy to just keep a personal list of things to look into down the line :-)
If you add tags to the github issue tracker, I can mark it as "nice to have" or "not important"
hello there, I am also looking for something similar to this.. any recent updates?
metric= evaluate.combine(["accuracy", "recall", "precision", "f1"])
Right now
Evaluate(...)
only take one metric, but often we have multiple different scores we want to test at the same time. Like "accuracy" and "gold_passages_retrieved" and "q/s" etc.While it's not obvious how to support multiple metrics for compilation, it should be easier to do for evaluation.