mlcommons / modelbench

Run safety benchmarks against AI models and view detailed reports showing how well they performed.
https://mlcommons.org/ai-safety/
Apache License 2.0
50 stars 8 forks source link

Pre v1 cleanup - first PR #410

Closed wpietri closed 3 weeks ago

wpietri commented 1 month ago

Assorted small cleanups of confusing code and unused files.

github-actions[bot] commented 1 month ago

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

wpietri commented 1 month ago

Looks good! I especially like the find_by_name addition to benchmark definition.

That was definitely inspired by your suggestion.

I’m curious about the choice to remove the v0.5 hazard content files because I thought we wanted to preserve v0.5 for now.

Those files were never actually used. I believe all actual 0.5 functionality is preserved.

wpietri commented 1 month ago

Added a fair bit more cleanup, this time around the SUTs.

bkorycki commented 1 month ago

I guess don't really understand the purpose of ModelGaugeSut. It doesn't seem to provide any functionality that doesn't already exist in the regular model gauge SUT objects.

wpietri commented 4 weeks ago

I guess don't really understand the purpose of ModelGaugeSut. It doesn't seem to provide any functionality that doesn't already exist in the regular model gauge SUT objects.

It has definitely gotten thinner over time. It originally existed because there was also a HelmSut object, and because we wanted user-friendly display names for SUTs. Now we're not using HELM, and and the user-friendly text has moved into the TOML files. The main remaining uses are actually of the superclass, SutDescription, which is just a SUT key, to which ModelGaugeSut adds the ability to look up a modelgauge SUT instance. We use SutDescription in testing, to have a list of SUTs whether or not we have the proper keys, and to create anon SUTs.

I think we'll keep that distinction, especially now that we have JSON output of the Benchmark which can be used to render reports. Instantiating a modelgauge SUT requires having secret keys for the provider, but rendering a report from the JSON doesn't need those.