stanfordnlp / dspy

DSPy: The framework for programming—not prompting—language models
https://dspy.ai
MIT License
19.38k stars 1.47k forks source link

dspy.Assert/Suggest #1873

Open fdzr opened 2 days ago

fdzr commented 2 days ago

Suppose I have the following program

def custom_evaluate(dataset, metric, model, debug=False):
    acc = 0
    cont = 1

    for item in dataset:
        pred = model(
            sentence1=item.sentence1,
            sentence2=item.sentence2,
            target_word=item.target_word,
        )

        if pred.answer == item.answer:
            acc += 1

        if debug is True:
            print("Prediction: ", pred.answer)

    print(f"Accuracy: {acc * 100 / len(dataset)}")

and my model or module has a dspy.Suggest instruction. I would like to know how to process this type of output 2024/11/28 15:33:48 INFO dspy.primitives.assertions: SuggestionFailed:

I mean, I want to know how to calculate accuracy over the examples with right format, skip or count the examples with bad format.

okhat commented 2 days ago

Hey @fdzr , I don't understand the question. But I want to signal that assertions are a highly advanced features.

Most of what you likely need to enforce on format can be specified in the signature itself using types.

fdzr commented 2 days ago

@okhat When you code your module using dspy.Assert or dspy.Suggest, you are instructing the model to re-evaluate certain examples if the naswer does not satisfy some condition. However, if the format is incorrect, the model will query the LLM a finite number of times, as outlined in this documentation, in an attempt to correct the format. If the LLM is unable to provide the correct format, you may encounter the following output:

2024/11/28 15:33:48 INFO dspy.primitives.assertions: SuggestionFailed:

My question is: How can I capture this in order to classify the example as a bad-format instance?