Open dniku opened 3 months ago
Thanks for filing the issue! Can you please check whether these questions are from the the inaccessible samples or accessible samples?
Can you please check whether these questions are from the the inaccessible samples or accessible samples?
These are all fact questions, so I'm not sure how they can be inaccessible.
Oh the fact questions are only used to measure token F1 scores, so we only use the correct_answer
!
Since you publish both correct_answer
and wrong_answer
, do you think it would be a good idea to make both valid?
What do you mean by making them both valid? Can you please explain a bit more?
I mean making wrong_answer
a possible wrong answer to the question. Currently you are publishing this field as part of the dataset, but it appears that it is not guaranteed to contain a string that could be an incorrect answer, given that in a few cases it is equal to correct_answer
.
This also leads me to suspect that wrong_answer
could not be an actual wrong answer in other cases as well, but those would not be as trivial as the ones that I reported here. Would it be possible to check all strings that you publish as wrong_answer
to be actual wrong answers?
Gotcha, I went through those instances and now I understand where the misunderstanding came from.
The reason why those correct_answer
and wrong_answer
are the same is because they are from accessible
instances (i.e., conversation). The accessible
instances are where there is no information asymmetry regarding the fact question. In other words, the question can be answered the same way, whether it's based on the conversation part in which character X was absent or the one in which X was involved. This is why those instances were labeled as accessible
when we built the dataset. So all fact questions in accessible
instances actually have very similar correct_answer
and wrong_answer
. And looks like there are even identical ones as you've reported. Hope this clear things up!
Maybe I should've made the wrong_answer
to contain empty strings for the fact questions or label them with a different name to minimize misunderstandings. Sorry for the confusion 🙏🏻 Please let me know if there are other issues!
Inspecting the dataset with:
I get:
which means that there are some items where the
factQA
field has identical values forcorrect_answer
andwrong_answer
. Is this an error in the dataset?