stanfordnlp / dspy

DSPy: The framework for programming—not prompting—foundation models
https://dspy-docs.vercel.app/
MIT License
16.93k stars 1.31k forks source link

curious about the prompt layout for language models #10

Closed Maxlinn closed 1 year ago

Maxlinn commented 1 year ago

hi okhat, thanks for your open sourcing! i have two questions if you would like to help:

  1. when i playing on the demo, i noticed that the actual prompt string(fed into the launguage model) containing symbols like ${} and ---. i am new to this area, is that some special usage of the model? the layout of references in Context field also seems unfamiliar. as an example:
    
    Write a search query that will help answer a complex question.

Follow the following format.

Context: ${sources that may contain relevant content}

Question: ${the question to be answered}

Rationale: Let's think step by step. Based on the context, we have learned the following. ${information from the context that provides useful clues}

Search Query: ${a simple question for seeking the missing information}


Context: [1] «Right Back at It Again | at the Kerrang! Awards. Personnel per digital booklet. Right Back at It Again "Right Back at It Again" is the second track and the first single from A Day to Remember's fifth album, "Common Courtesy" (2013). In October 20, 2015, the song was featured in Activision rhythm-music game, "". Vocalist, Jeremy McKinnon wrote the lyrics, while the music was written by McKinnon, former guitarist Tom Denney, guitarist Neil Westfall and producer Andrew Wade. "Right Back at It Again" almost wasn't included on the album as it was one of the excess songs the band had recorded, "we realised that it» [2] «Right Back at It Again | Right Back at It Again "Right Back at It Again" is the second track and the first single from A Day to Remember's fifth album, "Common Courtesy" (2013). In October 20, 2015, the song was featured in Activision rhythm-music game, "". Vocalist, Jeremy McKinnon wrote the lyrics, while the music was written by McKinnon, former guitarist Tom Denney, guitarist Neil Westfall and producer Andrew Wade. "Right Back at It Again" almost wasn't included on the album as it was one of the excess songs the band had recorded, "we realised that it sounded great, so on it went." "Right Back»

Question: Right Back At It Again contains lyrics co-written by the singer born in what city?


2. which is where does the qa pairs for in-context learning come from? 
you mentioned that along with the instructions, few qa pairs would be helpful(in variable `train: list`) to define the task, but i am curious about how to select qa pairs to build such a list.
in the demo, it was just given.

train = [('Who produced the album that included a re-recording of "Lithium"?', ['Butch Vig']), ('Who was the director of the 2009 movie featuring Peter Outerbridge as William Easton?', ['Kevin Greutert']), ('The heir to the Du Pont family fortune sponsored what wrestling team?', ['Foxcatcher', 'Team Foxcatcher', 'Foxcatcher Team']), ('In what year was the star of To Hell and Back born?', ['1925']), ('Which award did the first book of Gary Zukav receive?', ['U.S. National Book Award', 'National Book Award']), ('What city was the victim of Joseph Druces working in?', ['Boston, Massachusetts', 'Boston']),]

train = [dsp.Example(question=question, answer=answer) for question, answer in train]



thanks a lot for your kind help!
stalkermustang commented 1 year ago

Hi @Maxlinn , I'm not a team member, but I'll try to answer.

  1. These are "prompt engineering" techniques. They help the model (GPT3.5 in this case) to understand the format, and the instruction - what do we want it to do? ${a simple question for seeking the missing information}, for example, seems like a placeholder with a description. The model looks at this and "thinks" (following the pattern, not actually thinking) "Ok, there should be a question, based on the info I have and I don't have". Thus, when we feed the prompt with the ending "Question: ", the model knows what we expect from it. --- IMO just formatting splitters, they help the model to understand boundaries between blocks and improve visual recognition. I believe quality/principle won't degrade if we remove these. Th textual instructions are the most important parts of the prompt.

  2. I'm not sure where these questions came from, but you can find very similar ones here, in the HotPotQA dataset. You can easily get hundreds of similar knowledge-based questions there.

stalkermustang commented 1 year ago

the layout of references in Context field also seems unfamiliar.

do you mean the "context" section? Rows there are just numbers, titles (of wiki pages), and the associated wiki chunks (not whole pages, just parts). So model has access to this information during generation, and can look-and-copy from these passages. Copy factual information like names or dates.

Maxlinn commented 1 year ago

Hi @Maxlinn , I'm not a team member, but I'll try to answer.

  1. These are "prompt engineering" techniques. They help the model (GPT3.5 in this case) to understand the format, and the instruction - what do we want it to do? ${a simple question for seeking the missing information}, for example, seems like a placeholder with a description. The model looks at this and "thinks" (following the pattern, not actually thinking) "Ok, there should be a question, based on the info I have and I don't have". Thus, when we feed the prompt with the ending "Question: ", the model knows what we expect from it. --- IMO just formatting splitters, they help the model to understand boundaries between blocks and improve visual recognition. I believe quality/principle won't degrade if we remove these. Th textual instructions are the most important parts of the prompt.
  2. I'm not sure where these questions came from, but you can find very similar ones here, in the HotPotQA dataset. You can easily get hundreds of similar knowledge-based questions there.

hi stalkermustang, much thanks to your careful explaination : )

yes i know the semantic purposes of certain components of the prompt, i'm just wondering why uses the $ symbol (instead of something like #, &, % or anything else), why uses brackets as delimiters. it makes the prompt more like a programming language instead of natural language.

for now i think the format is not of great importance, the attention machanism will identify the useful parts of the prompt and ignore others, slim changes of symbols does not greatly affect the results.

for the train quesion-answer pairs, i think it can be (semantically) searched from a huge QA dataset (like the hotpotQA you provided).

thanks again for your response, i'd like to mark this issue as closed!