meta-llama / PurpleLlama

Set of tools to assess and improve LLM security.
Other
2.74k stars 453 forks source link

Evaluation script released? #14

Closed hxhcreate closed 7 months ago

hxhcreate commented 9 months ago

I was following this work. It would be greatly appreciated if you could release the evaluation code to help us reproduce your results!

SimonWan commented 7 months ago

Hi there, could you please provide more information, such as whether your question is about Llama Guard or CyberSecEval, and what exact script you are looking for? Thanks.

mbhatt1 commented 7 months ago

Which script are you looking for?

ujjwalkarn commented 7 months ago

Hi, if you're looking for Llama Guard evaluation, Llama recipes has a script for running inference. We then use sklearn's precision_score, recall_score, f1_score, average_precision_score to compute the metrics. Is this what you're looking for?

I will close this issue, but please reopen if you have further questions. Thanks!