rasbt / LLMs-from-scratch

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
https://www.amazon.com/Build-Large-Language-Model-Scratch/dp/1633437167
Other
32.84k stars 3.94k forks source link

Reflection Finetuning #350

Closed d-kleine closed 2 months ago

d-kleine commented 2 months ago

About ch07/05_dataset-generation/reflection-gpt4.ipynb:

grafik

Running the examples in this notebook costs about \$0.30 (30 cents) with GPT-4o-mini as of this writing

-> Does "examples" mean all 1100 entries or the few ones selected in the notebook?

def instr_prompt_no_input(ins, outp):
    sys_prompt = "You are a helpful, precise but picky assistant for checking the quality of a given instruction."
    prompt_template = """
    <thinking>
    [Instruction]
    {ins}   
    [The Start of Answer]
    {outp}
    [The End of Answer]
    </thinking>
    <reflection>
    [System]
    {criteria}
    </reflection>
    <output>
    [Final Answer]
    Provide a detailed analysis and suggestions based on the reflection.
    </output>
    """
    criteria = "We would like you to answer several questions related to the quality of a given instruction. \n" + \
                "1. Why is this instruction not good? Analyze the instruction based on complexity, detail, knowledge required, ambiguity, and reasoning involved. \n" + \
                "2. Why is the answer not good for the given instruction? Analyze based on helpfulness, relevance, accuracy, and detail level. \n" + \
                "3. Generate a new complex instruction and provide a detailed answer."
    prompt = prompt_template.format(
        ins=ins, outp=outp, criteria=criteria
    )
    return sys_prompt, prompt

P.S.: There is also a reflection-finetuned Llama 3.1 model available on Ollama: https://ollama.com/library/reflection

rasbt commented 2 months ago

Thanks for the feedback!

Regarding

With reference to the reflection finetuning approach, what do you think about adding , and tags to the prompt like here:

thanks for mentioning that. I prefer to leave it similar to the paper for now until there is some external, independent evidence in favor of the approach you referenced. It's just that there's been a lot of controversy around their benchmarks, and I don't really trust the results until I see multiple independent pieces of evidence.

-> Does "examples" mean all 1100 entries or the few ones selected in the notebook?

That meant the notebook as is, not the 1100 entries. But where it said

Running the examples in this notebook costs about $0.30 (30 cents) with GPT-4o-mini as of this writing

this should have been $0.03 (3 cents). I forgot a 0 there. Thanks for calling that out.

P.S.: There is also a reflection-finetuned Llama 3.1 model available on Ollama: https://ollama.com/library/reflection

Oh nice, small world!

rasbt commented 2 months ago

Clarified the cost

d-kleine commented 2 months ago

thanks for mentioning that. I prefer to leave it similar to the paper for now until there is some external, independent evidence in favor of the approach you referenced. It's just that there's been a lot of controversy around their benchmarks, and I don't really trust the results until I see multiple independent pieces of evidence.

I see, thanks!

this should have been $0.03 (3 cents). I forgot a 0 there. Thanks for calling that out.

Yeah, that confused me a little. Thanks for updating!

Running the notebook cost me $0.002, so even less. Whether it's $0.003 (0.3 cents) or $0.03 (3 cents), it's not too cost-intensive (as this might be critical point for some readers/users).