Precision/Recall Curves

meta-llama / PurpleLlama

Set of tools to assess and improve LLM security.

Other

2.73k stars 453 forks source link

Thank you for releasing Llama Guard 2, it looks like a very promising model!

I was wondering if it would be feasible to release precision/recall curves or numbers by harm category, for your internal benchmark evaluation? Or is there any hope of publicly releasing a small labeled test set for the community to evaluate for ourselves?

From Table 2 in the model card, it looks like a classification threshold of 0.5 results in a rather high FNRs for some categories and I'd like to use a classification threshold with more balanced errors, but am not sure how to go about tuning it myself because the new MLCommons harm taxonomy doesn't map 1:1 with public content classification datasets like OpenAI's moderation dataset.

meta-llama / PurpleLlama

Precision/Recall Curves #34