How to format dataset fields in model prompt?

Hi I'm looking to finetune an LLM using this dataset, and was wondering if there's any advice on how to format the prompt given the instruction vs input fields?

For example consider these entries:

  {
    "output":"The author has used personification in the sentence \"The cold breeze chills my bones.\" Personification is a figure of speech in which a non-human subject is given human characteristics. In this case, the non-human subject is the cold breeze, which is given the human characteristic of being able to chill someone's bones.",
    "input":"The cold breeze chills my bones.",
    "instruction":"Identify a stylistic device used by the author in the following sentence."
  }

 {
    "output":"Two players from the Kansas City Chiefs team are Patrick Mahomes and Tyreek Hill.",
    "input":"",
    "instruction":"Name two players from the Chiefs team?"
  }

I imagine two approaches:

Use the "instruction" as the system prompt, and the "input" as the first user chat message (which would often be empty though)...
Concatenate the instruction + input fields into a single (first) user chat message.

I think I'll use approach 2 but would appreciate any insights or references on this topic :)

gururise / AlpacaDataCleaned

How to format dataset fields in model prompt? #63