stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy-docs.vercel.app/
MIT License
16.82k stars 1.3k forks source link

Issues Combining BootstrapFewShot and dspy.context #1391

Closed thiagodsd closed 4 weeks ago

thiagodsd commented 1 month ago

I'm encountering a problem when using BootstrapFewShot with custom language models (LM) and datasets. Specifically, I receive an error when setting max_rounds to a value greater than 1:

...

File ~/venv/lib/python3.9/site-packages/dspy/teleprompt/bootstrap.py:182 in BootstrapFewShot._bootstrap_one_example(self, example, round_idx)
   180 with dsp.settings.context(trace=[], **self.teacher_settings):
   181 lm = dsp.settings.lm
-->182 lm.lm.copy(temperature=0.7+0.001*round_idx) if round_idx > 0 else lm

AttributeError: 'NoneType' object has no attribute 'copy'

I’m unable to share detailed logs or code snippets due to company policies, but I can provide a general outline of my setup:

from sagemaker.predictor import Predictor

class CustomLMClient(LM):
    def __init__(self, model, **kwargs):
        self.provider = "default"
        self.model = model
        self.client = Predictor(endpoint_name=self.model)
        ...

class CustomParquetDataset(Dataset):
    ...

class Summarizer(dspy.Signature):
    ...

class Judge(dspy.Signature):
    ...

lm_1 = CustomLMClient(model="mixtral")
lm_2 = CustomLMClient(model="mixtral")

class CustomModule(dspy.Module):
    def __init__(self, **kwargs):
        super().__init__()
        self.summarizer = dspy.ChainOfThought(Summarizer)
        self.summarizer._compiled = True
        self.generate_answer = dspy.ChainOfThought(Judge)

    def forward(self, text_field):
        with dspy.context(lm=lm_1, model="summarizer"):
            ...
        with dspy.context(lm=lm_2, model="judge"):
            ...
        return dspy.Prediction(...)

def risk_metric(example, pred, trace=None):
    ...

# dataset definitions here

teleprompter = BootstrapFewShot(metric=risk_metric, max_rounds=5, max_errors=3)
compile_lm = teleprompter.compile(CustomModule(), trainset=trainset)

The content of the experiment seems unrelated to the issue since everything works as expected when max_rounds=1. However, setting max_rounds to a value greater than 1 causes the process to complete one iteration (the progress bar reaches 100%) before triggering the error mentioned above.

okhat commented 1 month ago

Hey @thiagodsd ! Thanks for opening this issue.

Specifically, I receive an error when setting max_rounds to a value greater than 1:

This seems to be the core thing. When you do max_rounds > 1, BootstrapFewShot tries to copy your LM.

It seems that your LM object does not support .copy?

thiagodsd commented 1 month ago

Thank you for all the hard work maintaining this project!

It seems that your LM object does not support .copy?

I think you're right, it probably doesn't. In my company, due to security concerns, we can't directly access language models via API. Instead, we use proprietary packages as middlemen or compiled models on platforms like SageMaker.

Do you think it would be a good idea to implement the copy method in the language model class before wrapping it in CustomLMClient? Or is there a way to tweak dspy.LM to handle this scenario?

arnavsinghvi11 commented 1 month ago

Hi @thiagodsd , following the dspy.LM copy method should help here for your CustomLMClient use case.

thiagodsd commented 1 month ago

Thanks for the suggestion, @arnavsinghvi11! I'll explore some workarounds in that direction.

It's a bit surprising that my CustomLMClient didn't inherit this method from the LM class... I'll dig deeper into why that might be happening and see if there's something specific in my implementation causing this. Appreciate your input!

thiagodsd commented 4 weeks ago

Thank you, @arnavsinghvi11, for your helpful suggestion! I implemented the copy method, and it worked perfectly. I'll close this issue now.