kelvin-jiang / FreebaseQA

The release of the FreebaseQA data set (NAACL 2019).
Creative Commons Attribution 4.0 International
69 stars 1 forks source link

Hi, I find: there maybe are some redundancy in the questions. Is it right ? #3

Open simba0626 opened 5 years ago

simba0626 commented 5 years ago

For example: Who produced the film 12 Angry Men, which was scripted by Reginald Rose, starred Henry Fonda and was directed by Sidney Lumet?

Annotation information: "TopicEntityName": "12 angry men", "TopicEntityMid": "m.0m_tj", "InferentialChain": "film.film.produced_by",

the text "which was scripted by Reginald Rose, starred Henry Fonda and was directed by Sidney Lumet?" is redundancy, is it all right ?

Thank you for your help.

kelvin-jiang commented 5 years ago

Hi there,

These questions were pulled from trivia datasets and sites; trivia questions are sometimes designed to have multiple clues to make it easier for contestants to answer. That is the case here, but our Freebase triples can only represent one of these clues, which in this case was the "Who produced the film" part of the question.

Kelvin

simba0626 commented 5 years ago

Hi, thank you for your reply. On this data, I have two confusions, which need your help:
Firstly, I think machine maybe hardly understands which clue is useful, unless try every clue based on the schema of freebase. Secondly, even though machine does not understand the whole question, it can solve partial questions. For example, random one clue in multiple clues. I do not know whether these I understand are right or not.

yawei

kelvin-jiang commented 5 years ago

I'm not sure if I understand your questions, but I will try to answer them as best as I can: Regarding your first point, if you train a KBQA pipeline, you will likely have to train a relation detection model at some point in order to choose a clue. Regarding your second point, it is possible that what you said is correct: since the questions of FreebaseQA are more complicated and sometimes have multiple clues, there may be multiple "ways" to get to the correct answer.