How to use the graph data

meiru-cam commented 1 year ago

Hi, the graph data you provided only contains entities; how to construct a knowledge graph with these entities? what are the relation edges?

meiru-cam commented 1 year ago

I also tried to use the scripts provided to process knowledge graph data, however the triples generated seems to be incorrect

FrederickXZhang commented 1 year ago

Hi, Thanks for your interest in our work! All these triplets are extracted by the off-the-shelf tool. After seeing some noisy triplet outputs, we decided to discard those that contain long spans. Specifically, as stated in our paper, "Triplets whose span is longer than 15 tokens are dropped". Triples are extracted in the form of <subject, predicate, object>, and then "We then add directed edges from subject to predicate and from predicate to object. We add reverse edges and self-loops to enhance graph connectivity and improve information flow" as described in our paper. In other words, we don't use the predicates as relation edges. Instead, in our constructed knowledge graph, each edge between two nodes is unlabeled that doesn't carry any attribute.

meiru-cam commented 1 year ago

HI Frederick, thank you for the reply. Just to clarify, so the edges only show connectivity between <subject, predicate, object> where the predicate is usually a verb?

meiru-cam commented 1 year ago

Is there any linking between subjects? or it only exists when <subject of triple 1> is an <object of triple 2>

FrederickXZhang commented 1 year ago

Hi, please find answers below: 1) edges only show connectivity between <subject, predicate, object> => Your understanding is correct. As such, the underlying GNN model being used is GAT instead of R-GAT as we don't use edges to encode predicate info in our graph. Another practical reason for using plain connectivity is that, we are working with openIE systems, upon which there could be tons of relations between two entities, making it prohibitively difficult to use R-GAT to obtain node representations. 2) predicate is usually a verb? => most of the time, yes. For your reference, The allenNLP model we adopted to extract IE triplets is identical to the one in their online demo. 3) Is there any linking between subjects? => Yes. In fact, we have four types of nodes in the graph: entity node, entity mention node, predicate node and wiki node. The majority of the linkings are <subject entity node, predicate node> and <predicate node, object entity node>. Further, to enhance information flow and ground the graph to the surface mentions in the text (i.e., the exact entity mentions that appear in the text, e.g., Trump), we also connect entity nodes and entity mention nodes.

Thanks again for your interest in this work.

launchnlp / SEESAW

How to use the graph data #1