xiangwang1223 / knowledge_graph_attention_network

KGAT: Knowledge Graph Attention Network for Recommendation, KDD2019
MIT License
1.06k stars 310 forks source link

How does the directed graph propagation work? #2

Open yuanyuansiyuan opened 5 years ago

yuanyuansiyuan commented 5 years ago

The ego-network of an entity h in your work is the head entity, then how does the following example work?

page 5: u2 ->i2 ->e1->i1->u1, u2 in u1's 3-layer neighbors

what does the minus relation mean -r1 mean?

xiangwang1223 commented 5 years ago

Hi, -r1 means the inverse direction of r1. For example, if r1 denotes the relation "Interact", -r1 means the relation of "IsInteractedBy". Thanks for your interest.

yuanyuansiyuan commented 5 years ago

Thank you for your explanation. In your work, the neighbors of each entity are their target nodes. Then if u1 is the 4-layer neighbor of u2? the information from u1 is encoded in e_u2(4)? And how the inversed relation is modeled in training process? because you say that your CKG is a directed graph.

yuanyuansiyuan commented 5 years ago

I just don't understand the propagation process in CKG from this example.

yuanyuansiyuan commented 5 years ago

Another question: for the CF bpr loss function part, is the propagation starts only from the user nodes and item nodes in the user-item interacitons, not all the nodes in CKG are activated in GCN? but in the transR part, all the triples in the CKG are considered?

Have you ever tried also using GCN in the transR part? the h r t embeddings in transR are also generated from the GCN propagation process, then GCN can traverse the whole CKG.

Hope for your comment, thank you.

xiangwang1223 commented 5 years ago

Hi,

  1. CKG is a directed graph, and we model the relations between any two nodes in canonical and inverse directions. Namely, in CKG, there exist r1 from u1 to i1, as well as -r1 from i1 to u1. They are two different relations. For more information, please refer to Section 2 and the code.

  2. As for the example, please refer to the analogous instance in Figure 3 of our NGCF SIGIR19.

  3. For the loss of BPR, it provides the supervision signals from the user-item interactions, which are built upon the outputs of the propagation process. For the KGE part, TransR serves more like the regularization term, to regularize the initial representations.

  4. Thanks for your insightful suggestions. We actually tried to employ TransR on the GCN-output representations. It would perform slightly better but at the cost of time and model complexity.

ECNU109 commented 5 years ago

Hi, i am curious about the TopK setting. You post the results when K = 20. How about the results when K is smaller like 5 or 10? Thank you.

xiangwang1223 commented 5 years ago

Hi, we only have done the experiments when K spans from 20, 40, 60, to 80 and 100. You can try smaller K by changing the hyperparameter Ks (e.g., --Ks [1,2,3,5,10]). Thanks for your interest.

ECNU109 commented 5 years ago

Hi, for Figure3(b), why the results decrease a lot although the density increase?

xiangwang1223 commented 5 years ago

Thanks for your comment. We currently work on this phenomenon but have no exact idea of why it happened.

BodyCSoulN commented 2 years ago

Thank you for your explanation. In your work, the neighbors of each entity are their target nodes. Then if u1 is the 4-layer neighbor of u2? the information from u1 is encoded in e_u2(4)? And how the inversed relation is modeled in training process? because you say that your CKG is a directed graph.

Hi,have you solved your question? I don't understand the example, too.