Default prompting strategy suboptimal?

aidangomez commented 8 months ago

I get the sense that DSPy isn't telling the model enough about itself in order to properly react to the prompts being given.

As a simple example let's take the quick start tutorial where you construct a simple QA program that uses chain of thought:

question = "Who is the prime minister two before the current Canadian prime minister?"

class BasicQA(dspy.Signature):
    """Answer questions with short factoid answers."""

    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

generate_answer = dspy.ChainOfThought(BasicQA)
result = generate_answer(question=question)

The resulting prompt is:

Answer questions with short factoid answers.

---

Follow the following format.

Question: ${question}
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: often between 1 and 5 words

---

Question: Who is the prime minister two before the current Canadian prime minister?
Reasoning: Let's think step by step in order to

Which results in an answer of Stephen Harper from GPT3.5 and Cohere (both of which are wrong)

If you (as a human) were handed this it's not entirely clear what the submitter is trying to get you to do.

One reasonable reading of the prompt would be:

They've asked me to follow a format and provided keyed replacement variable ${question} and ${produce the answer} that they're going to give me
How come in the formatting description Question and Reasoning are seemingly templates to be followed, but Answer is an instruction?
They provided the ${question} "Who is the prime minister two before the current Canadian prime minister?", but where is the other variable?
How do I respond?

I think performance and clarity could be improved by providing a bit more guidance and structure to the model via a preamble or some other similar strategy.

Here is an example way we could adjust the prompt to give it more structure:

You are going to act as a DSPy program that generates a JSON-structure output that I will describe below. You'll be given an "Objective" to carry out, along with a response format specification. The format specification will give you the output keys to generate, in the order they should be generated, along with a description of what you are expected to provide in their values. In the formatting specification, I will annotate the information that will be provided for you using ${description} where `description` is a description of what content will replace the tag.

Objective: Answer questions with short factoid answers.

---

Follow the following format.

"question" : ${question},
"reasoning": please begin with "Let's think step by step to arrive at the answer. We ...",
"answer": often between 1 and 5 words

---

Question: Who is the prime minister two before the current Canadian prime minister?

Here the response from Cohere is

{
  "question": "Who is the prime minister two before the current Canadian prime minister?",
  "reasoning": "Let's think step by step to arrive at the answer. Assuming the current Prime Minister is Justin Trudeau, his predecessor is Stephen Harper. Before him is the one we seek.",
  "answer": "Paul Martin"
}

Which is correct! Correctness doesn't really matter as much as the fact that the prompt and the response are both now much more explicit. I think relying on JSON is probably the best bet given structured responses are going to become standard for both the closed and open-source model providers.

Let me know your thoughts on the above and whether what I'm describing makes sense or is unclear.

smwitkowski commented 8 months ago

@aidangomez I like your distinction between the "preamble" and the "objective". Adding something like this would do really well in improving the output via more explicit instructions. I'm not sure if this would be added to the dspy.Signature, a module like dspy.ChainOfThough, or somewhere else.

I think we should prioritize implementing this concept first. Then, we can dive into the proposition you made regarding structured outputs. There have been some discussions about how to add this functionality (https://github.com/stanfordnlp/dspy/issues/264). I'd like to see something similar to 'Outlines' where the "search space" is limited based on the defined pattern, but I know there are some logistical hurdles there.

kevon217 commented 8 months ago

A preamble or high-level context/goal/awareness feature would be nice.

okhat commented 8 months ago

Thanks @aidangomez , this makes sense to me. We've started a refactor of the internals #390 and I'll think about the right place to do this.

I confess I'm conceptually uncomfortable with having a global "DSPy instruction" that exists in all prompts but I do agree that it adds important context and may improve most models' responses, even before optimization/compiling.

I'll update here again this week.

aidangomez commented 8 months ago

Thanks all, @okhat could you say more about the discomfort?

If I'm going to guess I would assume that the concern is around hurting generality of DSPy by providing a instruction that might limit DSPy's application?

I think that's a fair criticism, although I do think the gains of giving the model context of the framework it's operating within are important. It should definitely be an optional thing, I guess the question is whether it's "on by default" or not.

A wholesale alternative to a preamble is that Cohere, OpenAI, Meta, etc. each need to incorporate data that looks like DSPy so our models can recognise they're operating in the DSPy context and return in the format expected by DSPy. This is brittle since upstream DSPy changes then need to be propagated backwards into new generations of models. So pretty much not a viable option.

thomasahle commented 8 months ago

Would it be a solution for DSPy to have more flexibility in templates? E.g. you could use a JSONTemplate, that would format all inputs/outputs in json, a ClassicTemplate that uses the current DSPy Name: data format, and maybe a ToolsTemplate that tries to use tool syntax for everything. Then people can pick what works better for their use-case or the lm they are using.

okhat commented 8 months ago

The idea was always to have Adapters — lightweight exchangeable translators between signatures (filled with optimized instructions and/or demos) and final LM calls.

https://github.com/stanfordnlp/dspy/pull/138

We can revisit this in the current refactor — particularly interested in @CyrusOfEden ’s thoughts on this for #424

okhat commented 8 months ago

If we have Adapters, we can have a CohereAdapter that is designed to work well at mapping optimized DSPy parts with whatever frozen decisions are good for Cohere.

Same for any special backend — openai function calling, outlines, etc

ryanh-ai commented 6 months ago

Any additional thoughts or plans on having Adapter strategy that allows model specific prompting strategies? Claude for example uses XML as part of its notation to draw attention and label key elements of prompt.

CyrusOfEden commented 6 months ago

@canada4663 it's coming in the backend-refactor branch — we're exposing a prepare_request that lets you override the prompts, messages, and params as you see fit.

Our TODOs before merging are roughly: [ ] Merge main into backend-refactor [ ] Create deprecation notices for the existing dsp.X modules [ ] Bump the DSPy version [ ] Merge into main

ryanh-ai commented 6 months ago

Very cool, will checkout that branch

elsatch commented 5 months ago

I was exactly thinking about this particular topic last week, going in circles around how could DSpy optimized for syntax like Claude XML (or any other future adaptations for Mistral, Cohere, etc).

One idea that could be worthwhile exploring would be using the prompt libraries from the model creators to build a best practices dataset, then ground the adapter on those examples.

This is one of the existing libraries for Claude model: https://docs.anthropic.com/claude/prompt-library

theta-lin commented 3 months ago

@okhat I think it would be great if some priority could be placed on this issue as having hard-coded prompts would greatly contradict the design philosophy of DSPy of having declarative signatures and not dependent on the features of any particular LM (which I interpreted mostly from this HN post).

Suggestion for a Quick Fix

It appears that when using the Predict module, the LM completions would come from either https://github.com/stanfordnlp/dspy/blob/d8b8909773fc31e72cec093db2f26109590e524e/dspy/predict/predict.py#L137-L140 or https://github.com/stanfordnlp/dspy/blob/d8b8909773fc31e72cec093db2f26109590e524e/dspy/predict/predict.py#L162-L166 depending on whether to using experimental features or not in the settings.

However, while signature_to_template() has an argument for a custom adapter, there appears to be no way to either pass in a custom adapter or specify one in the global settings. Thus, simply allow specifying a custom adapter for Predict would be sufficient for providing a custom prompt format.

Better Prompt Format Specification

I think using something like a Jinja template as brought up in #996 might be a neater way to specify an adapter then in pure Python code.

Current Workaround

To add a preamble at the very beginning, it appears that either signature_to_template() or Template.__call__() needs to be overridden.
To customize Follow the following format. and the list of fields, Template.guidelines() needs to be overridden.
To prompt the reasoning field of ChainOfThought more clearly (the default at least doesn't work that well for me in Llama 3 8B Instruct), pass rationale_type to ChainOfThought.__init__(). #809

stanfordnlp / dspy