stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy-docs.vercel.app/
MIT License
16.01k stars 1.24k forks source link

Question: Assertions vs typed predictor + enum for multiple choice question #1247

Open theta-lin opened 1 month ago

theta-lin commented 1 month ago

I wonder what would be the best way to implement a multiple choice question in DSPy? For example, consider this problem in a RAG pipeline where you what to choose the best retriever type out of either "vector" or "keyword" to query a database for relevant information, how would you ask the LM to do this?

One possible way might be with ChainOfThought + Suggestion:

class RetrieverSelector(dspy.Signature):
    """Choose the best retriever type for querying the database for texts that could answer the question."""

    question = dspy.InputField(desc="The question to be answered.")
    retriever_type = dspy.OutputField(
        desc="The best type of retriever to use for the given question.\n"
        '"vector": Retrieves texts that are semantically similar to the query.\n'
        '"keyword": Retrieves texts that contain the same keywords used in the query.'
    )

retriever_selector = dspy.ChainOfThought(RetrieverSelector)
s = self.retriever_selector(question=question)
dspy.Suggest(
    s.retriever_type in ["vector", "keyword"],
    'The retriever type should be either "vector" or "keyword".',
)

The other way would be using TypedChainOfThought + Enum:

class RetrieverType(Enum):
    VECTOR = "vector"
    KEYWORD = "keyword"

class TypedRetrieverSelector(dspy.Signature):
    """Choose the best retriever type for querying the database for texts that could answer the question."""

    question: str = dspy.InputField(desc="The question to be answered.")
    retriever_type: RetrieverType = dspy.OutputField(
        desc="The best type of retriever to use for the given question.\n"
        '"vector": Retrieves texts that are semantically similar to the query.\n'
        '"keyword": Retrieves texts that contain the same keywords used in the query.'
    )

typed_retriever_selector = dspy.TypedChainOfThought(TypedRetrieverSelector)
ts = self.typed_retriever_selector(question=question)

I think the first approach appears to work better for now, which could be illustrated with some prompting with Meta-Llama-3-8B-Instruct. Note that the exact conversation has being altered for privacy issues, but the conversation basically looks like this.

The first approach

Prompt:

Choose the best retriever type for querying the database for texts that could answer the question.

---

Follow the following format.

Question: The question to be answered.

Reasoning: Let's think step by step in order to ${produce the retriever_type}. We ...

Retriever Type: The best type of retriever to use for the given question. "vector": Retrieves texts that are semantically similar to the query. "keyword": Retrieves texts that contain the same keywords used in the query.

---

Question: Who is John Doe?

Reasoning: Let's think step by step in order to Question: Who is John Doe? Reasoning: Let's think step by step in order to produce the best retriever type. We need to find the exact information about John Doe, so we should look for texts that contain the specific keywords and phrases related to him.

Retriever Type: keyword

---

Question: Who is Jane Doe?

Reasoning: Let's think step by step in order to

Output:

Question: Who is Jane Doe?

Reasoning: Let's think step by step in order to determine the best retriever type. Since we are looking for a specific person, a "keyword" retriever type would be more suitable.

Retriever Type: keyword

The second approach

Prompt:

Choose the best retriever type for querying the database for texts that could answer the question.

---

Follow the following format.

Question: The question to be answered.

Reasoning: Let's think step by step in order to ${produce the retriever_type}. We ...

Retriever Type:
The best type of retriever to use for the given question.
"vector": Retrieves texts that are semantically similar to the query.
"keyword": Retrieves texts that contain the same keywords used in the query.. Respond with a single JSON object. JSON Schema: {"$defs": {"RetrieverType": {"enum": ["vector", "keyword"], "title": "RetrieverType", "type": "string"}}, "properties": {"value": {"$ref": "#/$defs/RetrieverType"}}, "required": ["value"], "title": "Output", "type": "object"}

---

Question: Who is John Doe?

Reasoning: Let's think step by step in order to
{
"value": "keyword"
}

Retriever Type:
{
"value": "keyword"
}

---

Question: Who is Jane Doe?

Reasoning: Let's think step by step in order to

Response:

{
"value": "keyword"
}

Therefore, I found that using assertions with normal ChainOfThought might be a better approach as there is an actual reasoning instead of simply repeat the output. I wonder whether there is an explanation regarding why TypedChainOfThought simply repeats the output for its reasoning?

Also, I tried to also get the LM to do the query writing, so that the query passed to the retriever would not be the original question, but could be keywords if using keyword retriever, for example. Thus, I tried adding

query: str = dspy.OutputField(
    desc="The query string to use for querying relevant texts.\n"
    'If retriever is "vector", write a passage that might be semantically similar to the real answer to the question.\n'
    'If retriever is "keyword", generate some keywords that might appear in the answer to the question.'
)

to both signatures. However, the first approach simply failed with the message

ValueError: ('Too many retries trying to get the correct output format. Try simplifying the requirements.', {'retriever_type': "ValueError('json output should start and end with { and }')"})

I think this might be related with #957, #1001, #1057, #1125, and #1246, and it could be caused by Llama 3 8B's relatively poor function calling ability. Still, the second approach works better also in this case.

Therefore, I wonder what is the best practice for implementing multiple choice? Also, a side question is that am I providing too much prompt regarding the retrievers?

arnavsinghvi11 commented 1 month ago

Hi @theta-lin , thanks for sharing this use case.

Both assertions and typed predictors are viable for this. Assertions have been tested more across some DSPy optimizers (BootstrapFewShot, not MIPROs) than typed predictors, which can make them easier to use for this task

theta-lin commented 1 month ago

@arnavsinghvi11 Thanks for the clarification, I guess I would lean more towards assertions for now, as Llama 3 8B appears not so good for producing structured output.

A follow up question is that, as shown above, TypedChainOfThought does not appear to show any actual "chain of thought" in its reasoning.

Question: Who is John Doe?

Reasoning: Let's think step by step in order to
{
"value": "keyword"
}

Retriever Type:
{
"value": "keyword"
}

I wonder whether this is the intended behavior?

kalanyuz commented 1 month ago

Bumping this thread because I have started using optimizers with TypedChainOfThought and resulted in worse model quality because it repeats output in it's reasoning as well. This seems to be random and comes at 50-50 chance when running the evaluations.

However, if I save the model to a file then the reasoning becomes output description 100% every time.

cc: @arnavsinghvi11