hy-struggle / PRGC

PRGC: Potential Relation and Global Correspondence Based Joint Relational Triple Extraction
107 stars 16 forks source link

Some issues w.r.t the dataset. #13

Open Rosenberg37 opened 2 years ago

Rosenberg37 commented 2 years ago

Why is there a duplicate triplet? For example, the second data from the validator of the NYT dataset: { "text": "In his authoritative and tough-minded new book , '' The Assassins ' Gate : America in Iraq , '' the New Yorker writer George Packer reminds us that the decision of the Bush administration to go to war against Iraq and its increasingly embattled handling of the occupation were both predicated upon large , abstract ideas about the role of America in the post-cold war world -- most notably , a belief in pre-emptive and unilateral action , the viability of exporting democracy abroad , the urge to streamline the military and the dream of remaking the Middle East .", "triple_list": [ [ "Middle East", "/location/location/contains", "Iraq" ], [ "Middle East", "/location/location/contains", "Iraq" ] ] }, There are two identical triples "Middle East /location/location/contains Iraq". While for the NYT-star, the same item is following: { "text": "In his authoritative and tough-minded new book , '' The Assassins ' Gate : America in Iraq , '' the New Yorker writer George Packer reminds us that the decision of the Bush administration to go to war against Iraq and its increasingly embattled handling of the occupation were both predicated upon large , abstract ideas about the role of America in the post-cold war world -- most notably , a belief in pre-emptive and unilateral action , the viability of exporting democracy abroad , the urge to streamline the military and the dream of remaking the Middle East .", "triple_list": [ [ "East", "/location/location/contains", "Iraq" ] ] }, which in contrast doesn't have two same triples. Is this phenomenon justified? Does this affect the final performance for the experiment?