stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy-docs.vercel.app/
MIT License
18.38k stars 1.41k forks source link

[ERROR] MIPROv2 + multi hop module results in IndexError #1390

Closed etwk closed 1 month ago

etwk commented 2 months ago

MIPROv2 with multiple hop module results in error IndexError: list index out of range.

Using the latest version: dspy-ai==2.4.13

Error details:

IndexError                                Traceback (most recent call last)
Cell In[185], line 29
     19 lm = llm_compile
     20 optimizer_MIPROv2 = MIPROv2(
     21     prompt_model=lm,
     22     task_model=lm,
   (...)
     27     track_stats=False,
     28 )
---> 29 optimized_program_MIPROv2 = optimizer_MIPROv2.compile(
     30     ContextVerdict(),
     31     trainset=train_HotPotQA,
     32     valset=validation_HotPotQA,
     33     num_batches=batches,
     34     max_bootstrapped_demos=2,
     35     max_labeled_demos=2,
     36     requires_permission_to_run=False,
     37     eval_kwargs=eval_kwargs,
     38 )

File /opt/conda/lib/python3.11/site-packages/dspy/teleprompt/mipro_optimizer_v2.py:291, in MIPROv2.compile(self, student, trainset, valset, num_batches, max_bootstrapped_demos, max_labeled_demos, eval_kwargs, seed, minibatch, program_aware_proposer, requires_permission_to_run)
    289 proposer.use_instruct_history = False
    290 proposer.set_history_randomly = False
--> 291 instruction_candidates = proposer.propose_instructions_for_program(
    292     trainset=trainset,
    293     program=program,
    294     demo_candidates=demo_candidates,
    295     N=self.n,
    296     prompt_model=self.prompt_model,
    297     T=self.init_temperature,
    298     trial_logs={},
    299 )
    300 for i, pred in enumerate(program.predictors()):
    301     instruction_candidates[i][0] = get_signature(pred).instructions

File /opt/conda/lib/python3.11/site-packages/dspy/propose/grounded_proposer.py:302, in GroundedProposer.propose_instructions_for_program(self, trainset, program, demo_candidates, prompt_model, trial_logs, N, T, tip)
    299         if pred_i not in proposed_instructions:
    300             proposed_instructions[pred_i] = []
    301         proposed_instructions[pred_i].append(
--> 302             self.propose_instruction_for_predictor(
    303                 program=program,
    304                 predictor=predictor,
    305                 pred_i=pred_i,
    306                 prompt_model=prompt_model,
    307                 T=T,
    308                 demo_candidates=demo_candidates,
    309                 demo_set_i=demo_set_i,
    310                 trial_logs=trial_logs,
    311                 tip=selected_tip,
    312             ),
    313         )
    314 return proposed_instructions

File /opt/conda/lib/python3.11/site-packages/dspy/propose/grounded_proposer.py:349, in GroundedProposer.propose_instruction_for_predictor(self, program, predictor, pred_i, prompt_model, T, demo_candidates, demo_set_i, trial_logs, tip)
    347 with dspy.settings.context(lm=prompt_model):
    348     prompt_model.kwargs["temperature"] = T
--> 349     proposed_instruction = instruction_generator.forward(
    350         demo_candidates=demo_candidates,
    351         pred_i=pred_i,
    352         demo_set_i=demo_set_i,
    353         program=program,
    354         data_summary=self.data_summary,
    355         previous_instructions=instruction_history,
    356         tip=tip,
    357     ).proposed_instruction
    358 prompt_model.kwargs["temperature"] = original_temp
    360 # Log the trace used to generate the new instruction, along with the new instruction itself

File /opt/conda/lib/python3.11/site-packages/dspy/propose/grounded_proposer.py:201, in GenerateModuleInstruction.forward(self, demo_candidates, pred_i, demo_set_i, program, previous_instructions, data_summary, max_demos, tip)
    199     matches = re.findall(pattern, init_content, re.MULTILINE)
    200     modules = [match[0].strip() for match in matches]
--> 201     module_code = modules[pred_i]
    203 module_description = self.describe_module(
    204     program_code=self.program_code_string,
    205     program_description=program_description,
   (...)
    208     max_depth=10,
    209 ).module_description
    211 # Generate an instruction for our chosen module

IndexError: list index out of range

Script:

#!/usr/bin/env python
# coding: utf-8

import os
import dspy
from dspy.evaluate import Evaluate

os.environ["OPENAI_API_KEY"] = "abc"

colbertv2 = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
llm_compile = dspy.OpenAI(model='google/gemma-2-9b-it', api_base="http://127.0.0.1:8000/v1/", max_tokens=100, stop='\n\n')

dspy.settings.configure(rm=colbertv2, lm=llm_compile)

# based on: https://github.com/stanfordnlp/dspy/blob/main/skycamp2023.ipynb

train_dataset = [
    ('Kevin Greutert directed the 2009 movie featuring Peter Outerbridge as William Easton.', "True"),
    ('The heir to the Rockefeller family fortune sponsored the Foxcatcher wrestling team.', "False"),
    ('Audie Murphy, the star of To Hell and Back, was born in 1925.', "True"),
    ('The first book of Gary Zukav received the Pulitzer Prize.', "False"),
    ('The Killing Season, a documentary about the Gilgo Beach Killer, debuted on Netflix.', "False"),
    ('John Braine, an English author, wrote "Room at the Top".', "True"),
    ('Butch Vig produced the album that included a re-recording of "Lithium".', "True")
]

dev_dataset = [
    ('E. L. Doctorow has a broader scope of profession than Julia Peterkin.', "True"),
    ('Right Back At It Again contains lyrics co-written by the singer born in Gainesville, Florida.', "True"),
    ('The party of the winner of the 1971 San Francisco mayoral election was founded in 1854.', "False"),
    ('Anthony Dirrell is the brother of Andre Dirrell, a super middleweight title holder.', "True"),
    ('The sports nutrition business established by Oliver Cookson is based in Kent, UK.', "False"),
    ('The actor who played roles in First Wives Club and Searching for the Elephant was born on February 13, 1980.', "True"),
    ('Kyle Moran was born in the town on the Thames River.', "False"),
    ('The actress who played the niece in the Priest film was born in Surrey, England.', "True"),
    ('The movie in which the daughter of Noel Harrison plays Violet Trefusis is called Portrait of a Marriage.', "True"),
    ('The father of the Princes in the Tower was born in 1445.', "False"),
    ('The River Tyne is near the Crichton Collegiate Church.', "True"),
    ('Renault purchased the team Michael Schumacher raced for in the 1995 Monaco Grand Prix in 2000.', "True"),
    ('André Zucca was a French photographer who worked with a German propaganda magazine published by the Luftwaffe.', "False")
]

train = [dspy.Example(statement=statement, answer=answer).with_inputs('statement') for statement, answer in train_dataset]
dev = [dspy.Example(statement=statement, answer=answer).with_inputs('statement') for statement, answer in dev_dataset]

from dsp.utils import deduplicate

class CheckStatementFaithfulness(dspy.Signature):
    """Verify that the statement is based on the provided context."""

    context = dspy.InputField(desc="facts here are assumed to be true")
    statement = dspy.InputField()
    verdict = dspy.OutputField(desc="True/False/Irrelevant indicating if statement is faithful to context")

class GenerateSearchQuery(dspy.Signature):
    """Write a simple search query that will help retrieve info related to the statement."""
    context = dspy.InputField(desc="may contain relevant facts")
    statement = dspy.InputField()
    query = dspy.OutputField()

class ContextVerdict(dspy.Module):
    def __init__(self, passages_per_hop=3, count=3):
        super().__init__()
        self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(count)]
        self.retrieve = dspy.Retrieve(k=passages_per_hop)
        self.generate_verdict = dspy.ChainOfThought(CheckStatementFaithfulness)
        self.count = count

    def forward(self, statement):
        context = []
        for hop in range(self.count):
            query = self.generate_query[hop](context=context, statement=statement).query
            passages = self.retrieve(query).passages
            context = deduplicate(context + passages)

        verdict = self.generate_verdict(context=context, statement=statement)
        pred = dspy.Prediction(answer=verdict.verdict, rationale=verdict.rationale, context=context)
        return pred

# reference: https://github.com/stanfordnlp/dspy/blob/main/examples/multi-input-output/beginner-multi-input-output.ipynb

from dspy.teleprompt import MIPROv2

NUM_THREADS = 24
metric_EM = dspy.evaluate.answer_exact_match
eval_kwargs = {"num_threads": NUM_THREADS, "display_progress": False, "display_table": 0}

evaluate_MIPROv2 = Evaluate(
    devset=dev,
    metric=metric_EM,
    **eval_kwargs
)

n = 1 #10  # The number of instructions and fewshot examples that we will generate and optimize over
batches = 1 #30  # The number of optimization trials to be run (we will test out a new combination of instructions and fewshot examples in each trial)
temperature = 1  # The temperature configured for generating new instructions

lm = llm_compile
optimizer_MIPROv2 = MIPROv2(
    prompt_model=lm,
    task_model=lm,
    metric=metric_EM,
    num_candidates=n,
    init_temperature=temperature,
    verbose=False,
    track_stats=False,
)
optimized_program_MIPROv2 = optimizer_MIPROv2.compile(
    ContextVerdict(),
    trainset=train,
    valset=dev,
    num_batches=batches,
    max_bootstrapped_demos=2,
    max_labeled_demos=2,
    requires_permission_to_run=False,
    eval_kwargs=eval_kwargs,
)

Tried both local model google/gemma-2-9b-it and gpt-4o-mini from OpenAI, same error.

I'm able to avoid the error by adding one line inside the module ContextVerdict:

# self.generate_query = dspy.ChainOfThought(GenerateSearchQuery)  # IMPORTANT: solves error `list index out of range`
self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(max_hops)]
okhat commented 2 months ago

Ah thanks for the note, tagging @XenonMolecule

arnavsinghvi11 commented 2 months ago

Hi @etwk, Try dspy.experimental - dspy.settings.configure(rm=colbertv2, lm=llm_compile, experimental=True).

This error happens as MIPRO is summarizing DSPy code and is dependent on certain module structure parsing to retrieve the right code. As certain new releases point towards the experimental setup (which will be comprehensively updated in future releases), I would advise checking if enabling experimental mode fixes this error.

etwk commented 2 months ago

Hi @etwk, Try dspy.experimental - dspy.settings.configure(rm=colbertv2, lm=llm_compile, experimental=True).

This error happens as MIPRO is summarizing DSPy code and is dependent on certain module structure parsing to retrieve the right code. As certain new releases point towards the experimental setup (which will be comprehensively updated in future releases), I would advise checking if enabling experimental mode fixes this error.

Hi @arnavsinghvi11 , thanks for the insight. I have tried experimental=True, same error.

MohammedAlhajji commented 1 month ago

Getting the same error here with GPT-4o. @etwk Have you found a workaround or a solution yet?

etwk commented 1 month ago

Getting the same error here with GPT-4o. @etwk Have you found a workaround or a solution yet?

Yes, please find it at the end of the first post.

Add one comment line has bypassed this issue for me:

# self.generate_query = dspy.ChainOfThought(GenerateSearchQuery)  # IMPORTANT: solves error `list index out of range`
shiluanzzz commented 1 month ago

I also encountered the same error when using MIPRO V2. Is there any new progress?

isaacbmiller commented 1 month ago

Ill fix this today or tomorrow @okhat

okhat commented 1 month ago

I think this is now fixed