mlflow / mlflow-website

7 stars 11 forks source link

Evaluate LLMs with custom metrics with LLM as a judge #77

Open iRahulPandey opened 5 months ago

iRahulPandey commented 5 months ago

Summary

This template is intended to capture a few base requirements that are needed to be met prior to filing a PR that contains a new blog post submission.

Please fill out this form in its entirety so that an MLflow maintainer can review and work with you in the process of drafting your blog content and in reviewing your blog submission PR.

PRs that are filed without a linked Blog Post Submission issue and a subsequent agreement on the content and topics covered for the blog post are not guaranteed to be reviewed or merged.

Acknowledgements

Proposed Title

Evaluate LLMs with custom metrics with LLM as a judge

Abstract

This blog post explores the capability of using large language models (LLMs) as automated judges to evaluate the quality of outputs from retrieval-augmented generation (RAG) pipelines within the MLflow framework. RAG pipelines combine information retrieval with language models to generate outputs informed by relevant textual sources. The post discusses how MLflow's mlflow.evaluate() function can leverage LLMs to score RAG outputs across multiple dimensions like relevance, coherence, and factuality, and even custom metrics, providing an automated way to assess both the retrieved information and the generated text.

Blog Type

Topics Covered in Blog

Thank you for your proposal! An MLflow Maintainer will reach out to you with next steps!

BenWilson2 commented 5 months ago

Great topic selection! Let's be sure to cover:

  1. Built-in metrics that are defined by specifying a particular constant in the model_type argument of evaluate()
  2. Defining extra_metrics to include additional pre-built metrics in addition to the default metrics that are defined by setting model_type
  3. Building a custom metric via make_genai_metric() (and using it)
  4. Building a fully customized metric via make_genai_metric_with_prompt() (and using it)

@iRahulPandey please let me know how you would prefer to submit working drafts (either through Google Docs or via a Draft PR in .md format) and I'll be happy to review progress and help you work toward getting it published in the blog!

iRahulPandey commented 5 months ago

@BenWilson2 Sure! the blog will cover all points.

I will submit a draft via PR in .md format