hkust-nlp / deita

Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
Apache License 2.0
458 stars 28 forks source link

[Question] Script to train Scorer model? #11

Closed agi-piggy closed 6 months ago

agi-piggy commented 7 months ago

Hi the team,

Great work! I wonder whether it is possible to publish your training script for scoring model? I find my local version not working. Thanks!

VPeterV commented 7 months ago

Hi, you can refer to our scripts for mistral-scorers which perform better than llama-1-13b on our validation set. It is an experimental version, we will release more details including scripts for llama-scorers in our formal version.

And please convert your data to sharegpt format:

 [ {
    "conversations": [
      {
        "value": "You are a helpful assistant. Please identify the complexity score of the following user query. \n##Query: Add a requirement that the twist ending must involve a secondary protagonist who has been secretly manipulating the main character's journey for their own gain. The mystical artifact should have a deeper significance beyond its initial appearance, and the twist should reveal a larger conspiracy or power struggle at play. The reader should be left questioning the true nature of the protagonist's actions and the morality of their choices. \n##Complexity: ",
        "from": "human"
      },
      {
        "value": "6",
        "from": "gpt"
      }
    ]
  },
  {
    "conversations": [
      {
        "value": "You are a helpful assistant. Please identify the complexity score of the following user query. \n##Query: Given the concerning problem of hypoxia in Lake Michigan and its negative impact on the fish population, craft a comprehensive and attention-grabbing headline that not only summarizes the current state of affairs but also highlights the extensive consequences and the pressing need for immediate and effective actions to address this ecological crisis. \n##Complexity: ",
        "from": "human"
      },
      {
        "value": "5",
        "from": "gpt"
      }
    ]
  }]