wandb / weave

Weave is a toolkit for developing AI-powered applications, built by Weights & Biases.
https://wandb.me/weave
Apache License 2.0
734 stars 67 forks source link

Feature request: Return both `summary` & `eval_results` from `Evaluation.evaluate` #3038

Open TeoZosa opened 1 week ago

TeoZosa commented 1 week ago

For this code: https://github.com/wandb/weave/blob/3eea8df108274ff22e4a9a382c66cc1e54f7213d/weave/flow/eval.py#L493-L510

Only summary is returned. For our use-case, we also want to grab eval_results for downstream rendering and storage (to present user-friendly results to non-technical stakeholders; the Weave UI is information overload for those folks). Would this change make sense?

As a workaround, I can call get_eval_results and summarize separately, but lose eval tracking in Evaluations since only Evaluation.evaluate Calls are picked up.

jwlee64 commented 1 week ago

Hi @TeoZosa, I can raise this to the team tomorrow.

TeoZosa commented 6 days ago

@jwlee64 sounds good, thanks for the prompt reply 👍

FWIW for feedback, I'm working around the issue by vendoring the code with this change:

-   @weave.op()
+   @weave.op(postprocess_output=lambda output: output[0])
    async def evaluate(self, model: Union[Callable, Model]) -> dict:
        eval_results = await self.get_eval_results(model)
        summary = await self.summarize(eval_results)

        print("Evaluation summary", summary)

-        return summary
+        return summary, eval_results
jwlee64 commented 6 days ago

Hi @TeoZosa, we are going to have someone tackle this in the next week or so, or at minimum document a better way to get the eval_results with the current api.

TeoZosa commented 5 days ago

Got it. Thanks for the update @jwlee64, keep me posted! 🙏