stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy-docs.vercel.app/
MIT License
18.61k stars 1.43k forks source link

format_handler error when setting non-string as type for OutputField #577

Closed stantonius closed 7 months ago

stantonius commented 8 months ago

Thanks for all your hard work on this project.

When trying to set the dspy.OutputField as a list

class SRLSignature(dspy.Signature):
    """
    Generate Semantic Role Labeling (SRL) for a given sentence.
    """
    sentence: str = dspy.InputField()
    predicates: list[str] = dspy.OutputField(
        desc="The list of predicates in the sentence",
        format="answers"
    )

items = df.groupby('sentence').apply(lambda x: {'sentence': x.name, 'predicates': list(set(x['predicate']))}).tolist()
examples = [dspy.Example(**item).with_inputs('sentence') for item in items]

class SRLModule(dspy.Module):
    """
    Generate Semantic Role Labeling (SRL) for a given sentence.
    """
    def __init__(self):
        super().__init__()
        self.runner = dspy.Predict(SRLSignature)

    def forward(self, sentence):
        return self.runner(sentence=sentence)

mod = SRLModule()
teleprompter = BootstrapFewShot(metric = answer_exact_match, max_bootstrapped_demos=8, max_labeled_demos=4, max_rounds=2)
compiled_program = teleprompter.compile(student = mod, teacher = mod, trainset=examples)

returns this error when running the compiler:

AssertionError: Need format_handler for predicates of type <class 'list'>

I scoured the docs and code but couldn't make sense of this error. I know it has something to do with Template V2 put past that I gave up.

So the questions are:

I think at the very least we need a better error message. I would be happy to help with this - just need to understand whats happening here first :)

By the way, we get the same error when using a non-string type for the dspy.InputField too

Thanks for any guidance

okhat commented 8 months ago

The easiest thing to do is to format your item as a string before you pass it to the method

okhat commented 8 months ago

But you can pass format=fn where fn is a function that will create a string out of your list.

from dsp import passages2text

is a function that does that

stantonius commented 8 months ago

I think you want:

    format=list

OK this was annoyingly simple. Thanks for the quick reply.

However it only works when I change the output field name to "answer":

class SRLSignature(dspy.Signature):
    """
    Generate Semantic Role Labeling (SRL) for a given sentence.
    """
    sentence: str = dspy.InputField()
    # predicates: list[str] = dspy.OutputField(
    #     desc="The list of predicates in the sentence",
    #     format=list
    # )
    answer: list[str] = dspy.OutputField(
        desc="The list of predicates in the sentence",
        format=list
    )

Now the compiler seems to run

stantonius commented 8 months ago

But you can pass format=fn where fn is a function that will create a string out of your list.

from dsp import passages2text

is a function that does that

This also has the compiler running, so long as again I set my output to "answer".

Not sure which one is the best - maybe both?

I also know there is a big rewrite going on, so to summarise the things Ive learnt that may be addressed in the rewrite/updated docs:

  1. The format are in a signature field can take the sequence type list. It can also take a function (ie. passages2text)
  2. The above solutions only work if you explicitly set the output field in your signature to answer. Wondering if this works for scenarios where you are asking for multiple OutputFields in the signature?

Let me know if I have misrepresented anything. Otherwise, thanks again both for your help

stantonius commented 7 months ago

Closing as I think the TypedPredictors seems to be doing the trick for me

brando90 commented 1 month ago

Closing as I think the TypedPredictors seems to be doing the trick for me

@stantonius do you mind sharing your solution with TypePredictors? TypePredictors afaik aren't the same as dspy.Output/InputField, so I'm not sure how you are trying to solve this but would love to learn!

brando90 commented 1 month ago

Some things I tried that failed:

# 2. Define ICL Module that takes multiple few-shot examples (problem-solution pairs)
class ICLMathModule(dspy.Signature):
    """Use ICL with multiple math problem-solution pairs to generate an answer."""
    # context = dspy.InputField(desc="Math context for ICL")
    examples = dspy.InputField(type=list[str], desc="List of problem-solution pairs for few-shot ICL")
    question = dspy.InputField(type=str, desc="New math question to solve")
    answer = dspy.OutputField(type=str, desc="Answer generated by ICL")

just typing directly.

brando90 commented 1 month ago

The easiest thing to do is to format your item as a string before you pass it to the method

This seems to work so far:

    def forward(self, contexts: list[str], question: str):
        _contexts: str = '\n'.join(contexts)
        # Step 1: Generate math problem-solution pairs from the list of contexts
        result = self.math_problems(contexts=_contexts)
        generated_qa_pairs = result.question_answer_pairs
       ...