Export evaluation results

wandb / weave

Weave is a toolkit for developing AI-powered applications, built by Weights & Biases.

https://wandb.me/weave

Apache License 2.0

659 stars 49 forks source link

Export evaluation results #1603

Open nthomsencph opened 4 months ago

nthomsencph commented 4 months ago

Hi, looking through the docs and code I cannot find a way to export the evaluation results. I'd like to be able to do something like

import weave

sample = ... # input data
evaluator = ... # eval function
model = ... # weave Model

# define eval
evaluation = weave.Evaluation(
    dataset=sample,
    scorers=[evaluator],
    trials=1,

)

# run eval
output = await evaluation.evaluate(model)

# how can I do something like this?

evaluated_rows = evaluation.results # contains the traces with predictions and evaluator output

In other words, I would like to download this table

I feel like this should be a standard feature?

jwlee64 commented 4 months ago

Hi @nthomsencph. Yes we really need an export button. There was some discussion within the team on whether or not to hold off on an export button that would act more intelligently(deal with expanded refs, and better match the UI).

I will put up a pr that adds a export button that uses mui data grid apis to export the table to csv. Note this will not work with expanded refs (though that functionality should come relatively soon)

https://github.com/wandb/weave/pull/1606

nthomsencph commented 4 months ago

Thanks for the quick reply. Looking forward to this feature.

A button in the UI would be great but I was looking for a programmatic way of exporting the table.

jwlee64 commented 4 months ago

Hi @nthomsencph! I just merged the Export to CSV pr which adds a button in the UI, which should hopefully help you make some headway. (Note that this export is not the final form of this feature, we plan on moving the export to the server to make it faster and fill in ref information)

I can create an internal ticket for exporting the table programmatically. https://wandb.atlassian.net/browse/WB-18680

If possible could you please specify what you intend to do with the data? Understanding the purpose would help us build for that use case more directly.

nthomsencph commented 4 months ago

Hi @jwlee64 - Thanks for the swift reply and action. I will check ii out today.

Please submit an internal ticket as well.

The reason for the programmatic export is that we run multiple experiments with different LLMs in our research. We want the freedom to fetch the table from each evaluation such that we can derive descriptive statistics and do e.g., hypothesis tests and correlation analyses.