Add functionality for evaluating model safety/toxicity

The current evaluation metrics supported by llm-eval are robust. However, upon reviewing the documentation, I found that the current repo doesn't account for evaluating model toxicity. Assessing LLMs for toxicity is tricky and there are (surprisingly) few comprehensive, tested open source solutions for doing so. I've identified a few options that could be added to the llm-eval Google Colab notebook.

TrustLLM

What is it? A python package that evaluates trustworthiness by evaluating LLM responses to a mixture of well-known evaluation datasets.
How does it work? Download the TrustLLM dataset, use TrustLLM and your (supported) model to generate responses for the dataset, use TrustLLM to evaluate Truthfulness, Safety, Fairness, Robustness, Privacy, Ethics.
- Works with models inferenced via APIs, locally public models (HuggingFace), online models via Replicate or DeepInfra
Questions
- Could we integrate TrustLLM functionality with Runpod for generating the responses that are eventually evaluated using TrustLLM?

I'm more than happy to further discuss and pick this issue up myself!

mlabonne / llm-autoeval

Add functionality for evaluating model safety/toxicity #28

TrustLLM