Closed alexbie98 closed 7 months ago
Hi there, Llama recipes provides a script for running Llama Guard inference. We then use sklearn's precision_score, recall_score, f1_score, average_precision_score to compute the metrics. Please let us know if this doesn't help!
How to get score like this:https://platform.openai.com/docs/guides/moderation/overview
Do you plan to release the eval scripts for the results in the paper and here (https://github.com/facebookresearch/PurpleLlama/blob/main/Llama-Guard/MODEL_CARD.md) for openai moderation and toxicchat?