thunlp / DocRED

Dataset and codes for ACL 2019 DocRED: A Large-Scale Document-Level Relation Extraction Dataset.
MIT License
622 stars 112 forks source link

How to align wikipedia text with wikidata #13

Closed caoh18 closed 5 years ago

caoh18 commented 5 years ago

Could you please offer code for align wikipedia data with wikidata to get the relation? I'm struggling with using wikidata query service to find the relation between two known entities. Thank you!

YeDeming commented 5 years ago

You can download the wikidata dump from https://dumps.wikimedia.org/wikidatawiki/20190701/ to get all triples.

caoh18 commented 5 years ago

You can download the wikidata dump from https://dumps.wikimedia.org/wikidatawiki/20190701/ to get all triples.

Then which one shall I download? there are too many different types that I am confused

shacharosn commented 5 years ago

This may be helpful:

{"P1376": "capital of", "P607": "conflict", "P136": "genre", "P137": "operator", "P131": "located in the administrative territorial entity", "P527": "has part", "P1412": "languages spoken, written or signed", "P206": "located in or next to body of water", "P205": "basin country", "P449": "original network", "P127": "owned by", "P123": "publisher", "P86": "composer", "P840": "narrative location", "P355": "subsidiary", "P737": "influenced by", "P740": "location of formation", "P190": "twinned administrative body", "P576": "dissolved, abolished or demolished", "P749": "parent organization", "P112": "founded by", "P118": "league", "P17": "country", "P19": "place of birth", "P3373": "sibling", "P6": "head of government", "P276": "location", "P1001": "applies to jurisdiction", "P580": "start time", "P582": "end time", "P585": "point in time", "P463": "member of", "P676": "lyrics by", "P674": "characters", "P264": "record label", "P108": "employer", "P102": "member of political party", "P25": "mother", "P27": "country of citizenship", "P26": "spouse", "P20": "place of death", "P22": "father", "P807": "separated from", "P800": "notable work", "P279": "subclass of", "P1336": "territory claimed by", "P577": "publication date", "P570": "date of death", "P571": "inception", "P178": "developer", "P179": "part of the series", "P272": "production company", "P170": "creator", "P171": "parent taxon", "P172": "ethnic group", "P175": "performer", "P176": "manufacturer", "P39": "position held", "P30": "continent", "P31": "instance of", "P36": "capital", "P37": "official language", "P35": "head of state", "P400": "platform", "P403": "mouth of the watercourse", "P361": "part of", "P364": "original language of film or TV show", "P569": "date of birth", "P710": "participant", "P1344": "participant of", "P488": "chairperson", "P241": "military branch", "P162": "producer", "P161": "cast member", "P166": "award received", "P40": "child", "P1441": "present in work", "P156": "followed by", "P155": "follows", "P150": "contains administrative territorial entity", "P551": "residence", "P706": "located on terrain feature", "P159": "headquarters location", "P495": "country of origin", "P58": "screenwriter", "P194": "legislative body", "P54": "member of sports team", "P57": "director", "P50": "author", "P1366": "replaced by", "P1365": "replaces", "P937": "work location", "P140": "religion", "P69": "educated at", "P1198": "unemployment rate", "P1056": "product or material produced"}