usc-isi-i2 / kgtk

Knowledge Graph Toolkit
https://kgtk.readthedocs.io/en/latest/
MIT License
357 stars 57 forks source link

Generalize `kgtk lift` to lift multiple relations concurrently. #177

Open CraigMiloRogers opened 4 years ago

CraigMiloRogers commented 4 years ago

From @szeke

I think lift should be generic too. Let's use the word relation for node1/relation/node2 so we don't get confused with the ambiguous meaning of label.

The lift command should accept a list of symbols that can appear in the relation column and lift them as headers.

Suppose I have edges as:

id1/n1/rel/o1 id2/n2/rel/o2 id3/id1/rel/o3

We will get 3 new columns:

node1;rel
node2;rel
rel

The above 3 edges will disappear from the file because we are lifting rel. Where will the info go?

o1 goes in node1;rel, o2 goes in node1;rel, o3 goes in rel

CraigMiloRogers commented 3 years ago

o2 goes in node1;rel -- I suspect this should be o2 goes in node2;rel

CraigMiloRogers commented 3 years ago

The branch issue/177/generalize-lift is ready to merge into dev.

CraigMiloRogers commented 3 years ago

Bug: not all lifts are taking place.

119% kgtk lift --input-file examples/docs/lift-file12.tsv --columns-to-write 'node1;label' 'label;label' 'node2;label' 'node1;P2' 'node2;P2' --remove-label-records False
node1   label   node2   node1;label     label;label     node2;label     node1;P2        node2;P2
Q31     P1      Q32     "Big Buyer"@en  "buys from"@en  "Big Seller"@en Q41     Q42
Q31     label   "Big Buyer"@en
Q32     label   "Big Seller"@en
P1      label   "buys from"@en
Q31     P2      Q41
Q32     P2      Q42
P2      label   "business category"
Q41     label   "retail"
Q42     label   "wholesale"

Where are the node1;label and nde2;label values in the P2 edges?

CraigMiloRogers commented 3 years ago

This works:kgtk lift --input-file examples/docs/lift-file12.tsv --columns-to-write 'node1;label' 'label;label' 'node2;label' --remove-label-records False node1 label node2 node1;label label;label node2;label Q31 P1 Q32 "Big Buyer"@en "buys from"@en "Big Seller"@en Q31 label "Big Buyer"@en Q32 label "Big Seller"@en P1 label "buys from"@en Q31 P2 Q41 "Big Buyer"@en "business category" "retail" Q32 P2 Q42 "Big Seller"@en "business category" "wholesale" P2 label "business category" Q41 label "retail" Q42 label "wholesale"

CraigMiloRogers commented 3 years ago

Ah, I understand: label rows are flagged not to be lifted.

1) Add a boolean option to change that policy.