The current evaluation metrics supported by llm-eval are robust. However, upon reviewing the documentation, I found that the current repo doesn't account for evaluating model toxicity. Assessing LLMs for toxicity is tricky and there are (surprisingly) few comprehensive, tested open source solutions for doing so. I've identified a few options that could be added to the llm-eval Google Colab notebook.
TrustLLM
What is it? A python package that evaluates trustworthiness by evaluating LLM responses to a mixture of well-known evaluation datasets.
How does it work? Download the TrustLLM dataset, use TrustLLM and your (supported) model to generate responses for the dataset, use TrustLLM to evaluate Truthfulness, Safety, Fairness, Robustness, Privacy, Ethics.
Works with models inferenced via APIs, locally public models (HuggingFace), online models via Replicate or DeepInfra
Questions
Could we integrate TrustLLM functionality with Runpod for generating the responses that are eventually evaluated using TrustLLM?
I'm more than happy to further discuss and pick this issue up myself!
Sorry it took so long for me to respond. Yeah, I think this would be a great addition to llm autoeval. You're more than welcome to add it if you're still interested! :)
The current evaluation metrics supported by
llm-eval
are robust. However, upon reviewing the documentation, I found that the current repo doesn't account for evaluating model toxicity. Assessing LLMs for toxicity is tricky and there are (surprisingly) few comprehensive, tested open source solutions for doing so. I've identified a few options that could be added to thellm-eval
Google Colab notebook.TrustLLM
I'm more than happy to further discuss and pick this issue up myself!