usc-isi-i2 / etk

Extraction Toolkit
https://etk.readthedocs.io/en/development/
MIT License
81 stars 48 forks source link

Extend supported data types #418

Open rongpenl opened 4 years ago

rongpenl commented 4 years ago

Extend support for additional data types of [properties] (https://rawgit.com/johnsamuelwrites/wdprop/master/datatypes.html) like WikibaseProperty.

Extend ETK so that a property can appear is the value of another property. For example, we want to use https://www.wikidata.org/wiki/Property:P1687 as a label, and the value would be another property, eg

node1 label node2
Q123 P11687 P1234

ETK does not allow data type to be Property

szeke commented 4 years ago

@rongpenl can provide assistance

szeke commented 4 years ago

@saggu work on this after the experiments for the paper

rongpenl commented 4 years ago

@saggu It may not be necessary to make changes in etk, I created a new branch of kgtk: https://github.com/usc-isi-i2/kgtk/tree/enhancement/data_type where I explicitly allows WikibaseProperty to be created as WDProperty defined in ETK. Please use the new wikidataProperty.tsv in data directory as the new property file for test. There are 13 properties are now of data type property, no longer string. P1963 is one of them.

Test input file:

node1 property node2 id Q1 P1963 P1985 fakeid

I tested the triple creation as well as importing into Blazegraph, it works fine. When you test it, you can update the unit test and merge the branch.

Hope this helps.