orionw / FollowIR

FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions
https://arxiv.org/abs/2403.15246
39 stars 0 forks source link

finetuning stage dataset specification #5

Closed rifaaQ closed 3 weeks ago

rifaaQ commented 1 month ago

I was wondering if the "document" column in FollowIR-train is used during the finetuning stage.

I have modified the json in Llama factory as the following documentations states: We used LLaMA-Factory to fine-tune Mistral to create FollowIR-7B, after transforming it to fit their format (input of "query" + "instruction" inside the template, output is the label, and instruction as the beginning of the template)

"FollowIR": { "hf_hub_url": "FollowIR-train.jsonl", "columns": { "prompt": ["instruction", "query"], "response": "label" } },

Please let me know if there are any insights on how the documents column was used. Thank you.

orionw commented 1 month ago

Thanks for the question! If you're using LLaMa-Factory at commit 60733275f8ebc7401bfaa43a18525757c2c4194a then this is the format of the dataset (not sure why it's doing the weird highlighting). The HF one is in a slightly different format since it's easier to use, this combines some fields:

[
    {
        "id": "TREC2015-MB-testtopics.txt---MB256",
        "instruction": "You are an expert Google searcher, whose job is to determine if the following document is relevant to the query (true/false). Answer using only one word, one of those two choices.\n",
        "input": "Query: Find information on the effect of the Pope using social media has on the beliefs and behavior of young Catholics. Pope Francis is trying to infuse Catholic beliefs in younger generations by using social media, including Twitter and Facebook.  The user is looking for evidence that this is an effective method to communicate a set of specific beliefs as well as whether such communication influences young people's behavior and perspectives about Catholicism.\nDocument: As Pope Francis continues to utilize social media platforms such as Twitter and Facebook to communicate Catholic beliefs to younger generations, there is growing evidence to suggest that this method is proving to be effective. Studies have shown that young people are increasingly turning to social media for information and guidance, and the Pope's active presence on these platforms allows Catholic beliefs to reach a wider audience. Furthermore, research has indicated that social media can have a significant impact on shaping young people's perspectives and behaviors. By engaging with the Pope's messages on social media, young individuals are exposed to Catholic teachings and are more likely to internalize and adopt these beliefs into their daily lives. It is evident that the use of social media as a tool for communicating specific Catholic beliefs is not only effective but also influential in shaping the perspectives and behavior of young people.\nRelevant (only output one word, either \"true\" or \"false\"):",
        "output": "true"
    },
   ...
  ]

Note that the instruction is fixed and that the input has all the elements put together with the output as the label.