Why there are so many "null" in the dataset?

XitaoLi commented 1 year ago

As the title described, when i load the dataset and try to reproduct the results as your paper, I found lots of labels are "null".

 {
  "question": "the south west wind blows across nigeria between",
  "golden_answer": "till September",
  "answer_fid": "February",
  "judge_fid": null,
  "answer_gpt35": "the months of\n\nNovember and March.",
  "judge_gpt35": null,
  "answer_chatgpt": "the months of May and September. This wind is known as the \"Harmattan\" and is characterized by dry and dusty conditions. It originates from the Sahara desert and brings with it a haze that reduces visibility and can cause respiratory problems. Despite these challenges, the Harmattan also brings cooler temperatures, making it a welcome relief from the hot and humid conditions of the rest of the year.",
  "judge_chatgpt": true,
  "answer_gpt4": "The southwest wind, also known as the Southwest Monsoon or the Southwesterlies, typically blows across Nigeria between June and September, bringing rain and humidity.",
  "judge_gpt4": null,
  "answer_newbing": "nan.",
  "judge_newbing": null,
  "improper": true
 },

Is it a mistake?

XitaoLi commented 1 year ago

More details: the null also appears if the key "improper" is False.

 {
  "question": "who played cruella de vil in 101 dalmatians",
  "golden_answer": "Glenn Close",
  "answer_fid": "Glenn Close",
  "judge_fid": true,
  "answer_gpt35": "Cruella de Vil was played by Glenn Close in the 1996 live-action version of 101 Dalmatians.",
  "judge_gpt35": true,
  "answer_chatgpt": "Cruella De Vil was played by Glenn Close in the 1996 live-action film \"101 Dalmatians.\"",
  "judge_chatgpt": true,
  "answer_gpt4": null,
  "judge_gpt4": null,
  "answer_newbing": "In the 1996 live-action adaptation of \u201c101 Dalmatians\u201d, the character of Cruella de Vil was portrayed by actressGlenn Close1.",
  "judge_newbing": true,
  "improper": false
 },

wangcunxiang commented 1 year ago

We did not annotate all the data for NQ-GPT4 because of the GPT-4 API access restriction and only annotated 500, which can be seen in Table 1 of the paper.

wangcunxiang / QA-Eval

Why there are so many "null" in the dataset? #1