relari-ai / continuous-eval

Data-Driven Evaluation for LLM-Powered Applications
https://continuous-eval.docs.relari.ai/
Apache License 2.0
441 stars 28 forks source link

Integration with DSPy #64

Open pantonante opened 4 months ago

pantonante commented 4 months ago

Implement a DspyMetricAdapter (or similar name) that allows users to utilize metrics defined as DSPy modules. This adapter will serve as a bridge, converting DSPy metrics a metric compatible with continuous-eval.

Technical Details:

dianetc commented 4 months ago

Scoping this out a bit further. Two questions:

1) Do we want this as a required or optional dependency? I'm guessinf the latter

We would need to add dspy-ai = ">=0.1.9" to The project toml

2) The handing of the DspyMetricAdapter

This will all essentially require adding a class to metrics/base.py

Something like? 👇

import dspy
from typing import Dict

class DspyMetricAdapter(Metric):
    def __init__(self, dspy_module: dspy.Module):
        super().__init__()
        self.dspy_module = dspy_module

    def __call__(self, **kwargs) -> Dict[str, Any]:

        dspy_outputs = self.dspy_module(**kwargs)
        metric_outputs = self._convert_outputs(dspy_outputs)

        return metric_outputs

    def _convert_outputs(self, outputs: Any) -> Dict[str, Any]:

        # logic to convert dspy module outputs to continuous_eval ouputs 
        metric_outputs = {}
        return metric_outputs

What is the desired flow? What I mean by flow is, do we want a user to be able to do something like 👇

 import dspy 
from continuous_eval.metrics.base import DspyMetricAdapter

dspy_module = dspy.A_Module(...)  
metric = DspyMetricAdapter(dspy_module)
result = metric(...)
pantonante commented 4 months ago

Hi Diane, thanks for working on the issue! 1) Yes, it should be an optional dependency 2) That's correct but I would not put the DspyMetricAdapter in the metrics/base.py file, maybemetrics/integrations/dspy.py`? We do not want to raise an error if DSPy is not installed or add redundant code to avoid it 3) The flow looks good to me!

dianetc commented 4 months ago

I need to understand "the adapter will wrap a DSPy module representing a metric" and "conversion of input data and output results" to a higher granularity.

1) "the adapter will wrap a DSPy module representing a metric"

What exactly do we want a user to use DSPy for within the continuous-eval framework?

A) Is the goal to be able to run DSPy but evaluate using one of the continuous-eval metrics?

This seems unlike given the class DspyMetricAdapter(Metric) structure indicates this class is a metric in itself. Also if A) is the case this wouldn't be a good first issue given how many metrics there are to consider.

B) Is the goal rather to allow a user to use metrics from the DSPy library within the continuous_eval framework? This would also be confusing since DSPy doesn't have many built-in metrics and its more on the user to create their own.

2) "conversion of input data and output results"

Answering the first should help here, but what exactly would we want to output?

pantonante commented 4 months ago

The idea is to use a DSPy module as metric. So you can define your signature, you wrap the signature in a module (optionally you can optimize it) and use it within a continuous-eval pipeline. For example, if a user created DSPY module to measure the tone of a answer it could be something like

class Tone(dspy.Module):
    def __init__(self):
        super().__init__()
        self._signature = dspy.ChainOfThought(...) # something here

    def forward(self, answer, ground_truth):
        return self._signature(answer=answer, ground_truth=ground_truth)

We want this Tone module to become a metric: something like tone_metric = DspyMetricAdapter(Tone())

so you could define you evaluation pipeline

pipeline = SingleModulePipeline(
    dataset=dataset,
    eval=[
        tone_metric.use(
            answer=ModuleOutput()
            ground_truth=dataset.ground_truth,
        ),
    ],
)