stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy-docs.vercel.app/
MIT License
13.98k stars 1.07k forks source link

Wrong parsing of output fields #847

Open DSLituiev opened 2 months ago

DSLituiev commented 2 months ago

The OutputField in a custom dspy.Signature are not getting parsed

class EquivalenceC(dspy.Signature):
    """Compare source and target:
    [...]
    """

    source = dspy.InputField(desc="extensive background information")
    target = dspy.InputField(desc="information that needs to be verified")
    match = dspy.OutputField(desc="one word of: yes, no, approximately; note that 'approximately' implies within 10% error.")
    correction = dspy.OutputField(desc="corrected replacement for 'target' if needed")

rationale_type = dspy.OutputField(prefix="Reasoning: Let's think step by step in order to",
                             desc="${understand whether the two inputs match and provide correction if needed}. The values provided are ...")

dspy.ChainOfThought(EquivalenceC, rationale_type=rationale_type)(source=source, target=target)

The result is:

Prediction(
    rationale='understand whether the two inputs match and provide correction if needed. The values provided are both units of time, but they are not exactly equivalent. There are 7 days in a week, so 4 weeks would be equal to 28 days. Therefore, the values mentioned in the target are not exactly equivalent to the information within the source.\n\nCorrection: 4 weeks is approximately equivalent to 28 days.',
    match='',
    correction='no\n\nCorrection: 4 weeks is approximately equivalent to 28 days.'
)

inspecting the LLM log:

Reasoning: Let's think step by step in order to understand whether the two inputs match and provide correction if needed. The values provided are both units of time, but they are not exactly equivalent. There are 7 days in a week, so 4 weeks would be equal to 28 days. Therefore, the values mentioned in the target are not exactly equivalent to the information within the source. Correction: 4 weeks is approximately equivalent to 28 days.

Match: no

Correction: 4 weeks is approximately equivalent to 28 days.
arnavsinghvi11 commented 2 months ago

Hi @DSLituiev , which LM is this for? Supporting backends for expected generations/parsing for chat models is WIP. You can overcome some of this through proper stopping conditions or external parsing of the generations where needed.

DSLituiev commented 2 months ago

This is gpt3.5 it works better with gpt4 anyhow, the output of gpt3.5 seems reasonable-ish to be parseable

On Thu, Apr 18, 2024 at 2:00 PM arnavsinghvi11 @.***> wrote:

Hi @DSLituiev https://github.com/DSLituiev , which LM is this for? Supporting backends for expected generations/parsing for chat models is WIP. You can overcome some of this through proper stopping conditions or external parsing of the generations where needed.

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/dspy/issues/847#issuecomment-2064767740, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAJGMUO2CILW6Z476YWXXTY6AC4FAVCNFSM6AAAAABGKM4FEOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRUG43DONZUGA . You are receiving this because you were mentioned.Message ID: @.***>

-- Dima Lituiev, PhD

canada4663 commented 2 months ago

@DSLituiev Were you able to get past this issue?

DSLituiev commented 2 months ago

No. Using Gpt4 for now

On Mon, Apr 22, 2024 at 10:03 PM canada4663 @.***> wrote:

@DSLituiev https://github.com/DSLituiev Were you able to get past this issue?

— Reply to this email directly, view it on GitHub https://github.com/stanfordnlp/dspy/issues/847#issuecomment-2071413760, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACAJGMT2XXKVEELBX2U6ZALY6XTS5AVCNFSM6AAAAABGKM4FEOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZRGQYTGNZWGA . You are receiving this because you were mentioned.Message ID: @.***>

-- Dima Lituiev, PhD

shantanubhusari commented 1 month ago

This issue is being observed with Bedrock Anthropic Sonnet LM as well. Using similar signature with ChainOfThought module. I also observed that on each invocation, predictor is calling invoke API twice and updating request args.

jeeyung commented 1 week ago

Have you figured out better parsing approach? I came across the same issue with Bedrock Anthropic Sonnet