Open fcanogab opened 1 month ago
Thanks for bringing this up! I think this is a worthwhile exercise for us to try and evaluate this benchmark. Looks like the benchmark is still in POC, but they have a repo with steps outlined on how to test it out: https://github.com/mlcommons/modelbench
Is this something you might have the bandwidth to try/look into @fcanogab?
we might also look at unitxt (an ibm open source project)
Jonathan Bnayahu has added some safety related benchmarks and others, see this search for list:
@hemajv, yes, I would like to try to work on this myself.
Thanks for the hint @erikerlandson. I'll take a look at it.
There are different frameworks to measure and benchmark against other models the safety/harmfulness of a fine-tuned model. For example, MLCommons defines a framework that can be used for this.