mlcommons / modelbench

Run safety benchmarks against AI models and view detailed reports showing how well they performed.
https://mlcommons.org/ai-safety/
Apache License 2.0
60 stars 9 forks source link

Benchmark outcomes record #392

Closed wpietri closed 3 months ago

wpietri commented 3 months ago

Produces a JSON version of the benchmark alongside the HTML files. Not sure this is totally right; Looking forward to feedback on the format.

github-actions[bot] commented 3 months ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

wpietri commented 3 months ago

Ok, @bkorycki and @dhosterman, this is actually ready for final review now.

wpietri commented 3 months ago

Ok @dhosterman and @bkorycki, I think I have resolved all the outstanding issues and requests on this one.

dhosterman commented 3 months ago

This works great so far, but it fails when attempting to use --anonymize.

dhosterman commented 3 months ago

I also notice that in the content data, we have a uid for the benchmark that is different than the uid in the benchmark data. We might want to go through and make sure that those things are aligned, as well as the versions.