implementaion about knowledge graph attention

junkangwu commented 3 years ago

From now on, we recommend using our discussion forum (https://github.com/rusty1s/pytorch_geometric/discussions) for general questions.

❓ Questions & Help

Hi, I'd like to implement the function as follows: That is, I need to calculate the attention alpha between the head entity and relations, and need to calculate the attention belta between relation and tail entity. So, How can I use pyg to implement attention score in terms of a part of neighbors, not all neighbors? Note: I can implement it in the numerator, but it's not clear how do I compute the denominator The above is the explanation of the Dimension of the tensor in the softmax function where num_edge is the number of all edge_index. Looking forward to your help! Thanks a lot!

rusty1s commented 3 years ago

You can filter your index via edge_type holding the relation type information, e.g.:

    for i in range(num_relations):
        mask = edge_type == i
        self.propagate(edge_index[:, mask], x=x, relation=i)

def message(self, x_j, index, relation):
     alpha = ...  # Compute attention scores
     alpha = softmax(alpha, index)
     ...

junkangwu commented 3 years ago

Thanks a lot, I understand your idea now!

junkangwu commented 3 years ago

@rusty1s By the way, I find it difficult to compute alpha and beta at the same time. If I filter index via edge_type, that is I focus on the subgraph about relation_i. Then beta attention could be implemented while alpha is still hard to address. For example, if I mask edge_type i, then in the self.propagate, the edge_index is all about r_i. So as for attention alpha, I couldn't know the other relations which are the neighbors about entity u.

The desired final result shows below: For example: attention_{e_1,e_2} = alpha_{e_1,r_1} * beta_{r_1,e_2} Looking forward to your help again!

rusty1s commented 3 years ago

I see. You can also do the filtering directly inside message:

    self.propagate(edge_index, x=x, edge_type=edge_type)

def message(self, x_i, x_j, index, edge_type):
    # Compute alpha based on `index`
    for i in range(self.num_relations):
        cur_index = index[edge_type == i]
        # Compute beta based on `cur_index` for relational type `i`

junkangwu commented 3 years ago

cool, In #Compute beta based on cur_index for relational type i, by using torch.where to fill the values into the specific indices in order to form a complete matrix. Do you agree?
By the way please, it seems that in the procedure of # Compute alpha based on index. It is very easy to appear the same entity, the same edge, attention double calculation. For example, in the case above figure: alpha_{e_1, r_1}=2 * value_{e_1, r_1} / (2 * value_{e_1, r_1} + 1 * value_{e_1, r_2} + 3 * value_{e_1, r_3)) But desired result is : alpha_{e_1, r_1} = value_{e_1, r_1} / (value_{e_1, r_1} + value_{e_1, r_2} + value_{e_1, r_3))

I'm sorry to trouble you lots of times as a green hand, Thanks a lot in advance. @rusty1s

rusty1s commented 3 years ago

Yes, I agree.
Alpha is indeed a bit tricky to compute. I would probably go with re-weighting attention-coefficients based on the number of equally typed relations in a neighborhood. Does that make sense to you?

junkangwu commented 3 years ago

Thanks a lot, I will have a try. It seems that indeed a bit tricky to compute.

junkangwu commented 3 years ago

Thank a lot for your patience. I have tried a trick and address it. The main idea is to control the index in order to using softmax function. We just need to prepare some index in advance which will not affect the training speed. And I made a custom adjustment to your softmax function：

  from torch_scatter import scatter
  def softmax(src, index, N, index2=None, ptr=None):
      if ptr is not None:
          src_max = gather_csr(segment_csr(src, ptr, reduce='max'), ptr)
          out = (src - src_max).exp()
          out_sum = gather_csr(segment_csr(out, ptr, reduce='sum'), ptr)

      elif index2 is not None: # as for the repeated (head, relation) which are thrown out, we need to selece them out
          src_max = scatter(src, index, dim_size=N, reduce='max')[index2]
          out = (src - src_max).exp()
          out_sum = scatter(out, index, dim_size=N, reduce='sum')[index2]

      elif index is not None:
          src_max = scatter(src, index, dim_size=N, reduce='max')[index]
          out = (src - src_max).exp()
          out_sum = scatter(out, index, dim_size=N, reduce='sum')[index]
      else:
          raise NotImplementedError
    return out / (out_sum + 1e-16)

The detailed attention computing procedure as follows:

import torch
import torch. nn as nn
edge_index = torch.LongTensor([[0, 0, 0, 0, 0, 0, 2],
        [1, 4, 2, 3, 5, 6, 3]])
# edge_index = torch.LongTensor([[0, 1, 1, 2, 2, 1, 0],
#         [1, 4, 2, 3, 5, 6, 3]])
edge_type = torch.LongTensor([0, 0, 1, 2, 2, 2, 3])
r = torch.randn(4, 4) # embedding for relation_type
h = torch.randn(7, 4) # embedding for entity type
weight = torch.randn(8, 1) 
dict_rt = {}
index_rt = []
cnt = 0
# prepare for beta
for i in range(edge_index.size(1)):
    if (edge_index[0, i].item(), edge_type[i].item()) not in dict_rt:
        index_rt.append(cnt)
        dict_rt[(edge_index[0, i].item(), edge_type[i].item())] = cnt
        cnt += 1
    else:
        index_rt.append(dict_rt[(edge_index[0, i].item(), edge_type[i].item())]) # index should be same under the same (head, relation) in order to compute the sum attention under the  (head, relation)
beta = torch.cat([h[edge_index[1]], r[edge_type]], dim=1) @ weight # attention between tail and relation [num_edge 1]
beta = nn.LeakyReLU(0.2)(beta)
print('beta before')
print(beta)
beta = softmax(beta, torch.LongTensor(index_rt), 7)
print('beta after')
print(beta)
# prepare for alpha
index_hr = []
dict_hr = {}
cnt = 0
for i in range(edge_index.size(1)):
    if (edge_index[0, i].item(), edge_type[i].item()) not in dict_hr:
        index_hr.append(edge_index[0, i]) # index should be same under the same (head) in order to compute the sum attention under the (head)
        dict_hr[(edge_index[0, i].item(), edge_type[i].item())] = cnt
        cnt += 1
    else:
        index_hr.append(edge_index.size(1)-1) # all the repeated (head, relation) should be thrown out avoid of double counting by giving a maxinmum index value
alpha = torch.cat([h[edge_index[0]], r[edge_type]], dim=1) @ weight # attention between head and relation [num_edge 1]
alpha = nn.LeakyReLU(0.2)(alpha)
print('alpha before')
print(alpha)
alpha = softmax(alpha, torch.LongTensor(index_hr), 7, edge_index[0])
print('alpha after')
print(alpha)

Focusing on the example in the figure, the final result is :

beta before
tensor([[-0.4961],
        [-0.3258],
        [ 2.4938],
        [ 3.6552],
        [ 3.0433],
        [ 2.1643],
        [ 3.5956]])
beta after
tensor([[0.4575],
        [0.5425],
        [1.0000],
        [0.5658],
        [0.3068],
        [0.1274],
        [1.0000]])
alpha before
tensor([[ 0.0215],
        [ 0.0215],
        [-0.3617],
        [-0.0067],
        [-0.0067],
        [-0.0067],
        [ 4.2091]])
alpha after
tensor([[0.3768],
        [0.3768],
        [0.2569],
        [0.3663],
        [0.3663],
        [0.3663],
        [1.0000]])

Overall, Thanks a lot for your patience with sharing!

yuto3o commented 3 years ago

Thank a lot for your patience. I have tried a trick and address it. The main idea is to control the index in order to using softmax function. We just need to prepare some index in advance which will not affect the training speed. And I made a custom adjustment to your softmax function：

  from torch_scatter import scatter
  def softmax(src, index, N, index2=None, ptr=None):
      if ptr is not None:
          src_max = gather_csr(segment_csr(src, ptr, reduce='max'), ptr)
          out = (src - src_max).exp()
          out_sum = gather_csr(segment_csr(out, ptr, reduce='sum'), ptr)

      elif index2 is not None: # as for the repeated (head, relation) which are thrown out, we need to selece them out
          src_max = scatter(src, index, dim_size=N, reduce='max')[index2]
          out = (src - src_max).exp()
          out_sum = scatter(out, index, dim_size=N, reduce='sum')[index2]

      elif index is not None:
          src_max = scatter(src, index, dim_size=N, reduce='max')[index]
          out = (src - src_max).exp()
          out_sum = scatter(out, index, dim_size=N, reduce='sum')[index]
      else:
          raise NotImplementedError
    return out / (out_sum + 1e-16)

The detailed attention computing procedure as follows:

import torch
import torch. nn as nn
edge_index = torch.LongTensor([[0, 0, 0, 0, 0, 0, 2],
        [1, 4, 2, 3, 5, 6, 3]])
# edge_index = torch.LongTensor([[0, 1, 1, 2, 2, 1, 0],
#         [1, 4, 2, 3, 5, 6, 3]])
edge_type = torch.LongTensor([0, 0, 1, 2, 2, 2, 3])
r = torch.randn(4, 4) # embedding for relation_type
h = torch.randn(7, 4) # embedding for entity type
weight = torch.randn(8, 1) 
dict_rt = {}
index_rt = []
cnt = 0
# prepare for beta
for i in range(edge_index.size(1)):
    if (edge_index[0, i].item(), edge_type[i].item()) not in dict_rt:
        index_rt.append(cnt)
        dict_rt[(edge_index[0, i].item(), edge_type[i].item())] = cnt
        cnt += 1
    else:
        index_rt.append(dict_rt[(edge_index[0, i].item(), edge_type[i].item())]) # index should be same under the same (head, relation) in order to compute the sum attention under the  (head, relation)
beta = torch.cat([h[edge_index[1]], r[edge_type]], dim=1) @ weight # attention between tail and relation [num_edge 1]
beta = nn.LeakyReLU(0.2)(beta)
print('beta before')
print(beta)
beta = softmax(beta, torch.LongTensor(index_rt), 7)
print('beta after')
print(beta)
# prepare for alpha
index_hr = []
dict_hr = {}
cnt = 0
for i in range(edge_index.size(1)):
    if (edge_index[0, i].item(), edge_type[i].item()) not in dict_hr:
        index_hr.append(edge_index[0, i]) # index should be same under the same (head) in order to compute the sum attention under the (head)
        dict_hr[(edge_index[0, i].item(), edge_type[i].item())] = cnt
        cnt += 1
    else:
        index_hr.append(edge_index.size(1)-1) # all the repeated (head, relation) should be thrown out avoid of double counting by giving a maxinmum index value
alpha = torch.cat([h[edge_index[0]], r[edge_type]], dim=1) @ weight # attention between head and relation [num_edge 1]
alpha = nn.LeakyReLU(0.2)(alpha)
print('alpha before')
print(alpha)
alpha = softmax(alpha, torch.LongTensor(index_hr), 7, edge_index[0])
print('alpha after')
print(alpha)

Focusing on the example in the figure, the final result is :

beta before
tensor([[-0.4961],
        [-0.3258],
        [ 2.4938],
        [ 3.6552],
        [ 3.0433],
        [ 2.1643],
        [ 3.5956]])
beta after
tensor([[0.4575],
        [0.5425],
        [1.0000],
        [0.5658],
        [0.3068],
        [0.1274],
        [1.0000]])
alpha before
tensor([[ 0.0215],
        [ 0.0215],
        [-0.3617],
        [-0.0067],
        [-0.0067],
        [-0.0067],
        [ 4.2091]])
alpha after
tensor([[0.3768],
        [0.3768],
        [0.2569],
        [0.3663],
        [0.3663],
        [0.3663],
        [1.0000]])

Overall, Thanks a lot for your patience with sharing!

Hello, I want to reproduce RGHAT. Do you reproduce the result successfully? I think that computing alpha and beta at the same time based on PyG is still hard.

junkangwu commented 2 years ago

Thank a lot for your patience. I have tried a trick and address it. The main idea is to control the index in order to using softmax function. We just need to prepare some index in advance which will not affect the training speed. And I made a custom adjustment to your softmax function：

  from torch_scatter import scatter
  def softmax(src, index, N, index2=None, ptr=None):
      if ptr is not None:
          src_max = gather_csr(segment_csr(src, ptr, reduce='max'), ptr)
          out = (src - src_max).exp()
          out_sum = gather_csr(segment_csr(out, ptr, reduce='sum'), ptr)

      elif index2 is not None: # as for the repeated (head, relation) which are thrown out, we need to selece them out
          src_max = scatter(src, index, dim_size=N, reduce='max')[index2]
          out = (src - src_max).exp()
          out_sum = scatter(out, index, dim_size=N, reduce='sum')[index2]

      elif index is not None:
          src_max = scatter(src, index, dim_size=N, reduce='max')[index]
          out = (src - src_max).exp()
          out_sum = scatter(out, index, dim_size=N, reduce='sum')[index]
      else:
          raise NotImplementedError
    return out / (out_sum + 1e-16)

The detailed attention computing procedure as follows:

import torch
import torch. nn as nn
edge_index = torch.LongTensor([[0, 0, 0, 0, 0, 0, 2],
        [1, 4, 2, 3, 5, 6, 3]])
# edge_index = torch.LongTensor([[0, 1, 1, 2, 2, 1, 0],
#         [1, 4, 2, 3, 5, 6, 3]])
edge_type = torch.LongTensor([0, 0, 1, 2, 2, 2, 3])
r = torch.randn(4, 4) # embedding for relation_type
h = torch.randn(7, 4) # embedding for entity type
weight = torch.randn(8, 1) 
dict_rt = {}
index_rt = []
cnt = 0
# prepare for beta
for i in range(edge_index.size(1)):
    if (edge_index[0, i].item(), edge_type[i].item()) not in dict_rt:
        index_rt.append(cnt)
        dict_rt[(edge_index[0, i].item(), edge_type[i].item())] = cnt
        cnt += 1
    else:
        index_rt.append(dict_rt[(edge_index[0, i].item(), edge_type[i].item())]) # index should be same under the same (head, relation) in order to compute the sum attention under the  (head, relation)
beta = torch.cat([h[edge_index[1]], r[edge_type]], dim=1) @ weight # attention between tail and relation [num_edge 1]
beta = nn.LeakyReLU(0.2)(beta)
print('beta before')
print(beta)
beta = softmax(beta, torch.LongTensor(index_rt), 7)
print('beta after')
print(beta)
# prepare for alpha
index_hr = []
dict_hr = {}
cnt = 0
for i in range(edge_index.size(1)):
    if (edge_index[0, i].item(), edge_type[i].item()) not in dict_hr:
        index_hr.append(edge_index[0, i]) # index should be same under the same (head) in order to compute the sum attention under the (head)
        dict_hr[(edge_index[0, i].item(), edge_type[i].item())] = cnt
        cnt += 1
    else:
        index_hr.append(edge_index.size(1)-1) # all the repeated (head, relation) should be thrown out avoid of double counting by giving a maxinmum index value
alpha = torch.cat([h[edge_index[0]], r[edge_type]], dim=1) @ weight # attention between head and relation [num_edge 1]
alpha = nn.LeakyReLU(0.2)(alpha)
print('alpha before')
print(alpha)
alpha = softmax(alpha, torch.LongTensor(index_hr), 7, edge_index[0])
print('alpha after')
print(alpha)

Focusing on the example in the figure, the final result is :

beta before
tensor([[-0.4961],
        [-0.3258],
        [ 2.4938],
        [ 3.6552],
        [ 3.0433],
        [ 2.1643],
        [ 3.5956]])
beta after
tensor([[0.4575],
        [0.5425],
        [1.0000],
        [0.5658],
        [0.3068],
        [0.1274],
        [1.0000]])
alpha before
tensor([[ 0.0215],
        [ 0.0215],
        [-0.3617],
        [-0.0067],
        [-0.0067],
        [-0.0067],
        [ 4.2091]])
alpha after
tensor([[0.3768],
        [0.3768],
        [0.2569],
        [0.3663],
        [0.3663],
        [0.3663],
        [1.0000]])

Overall, Thanks a lot for your patience with sharing!

Hello, I want to reproduce RGHAT. Do you reproduce the result successfully? I think that computing alpha and beta at the same time based on PyG is still hard.

Sorry to reply so late. However, I failed to reproduce it...

pyg-team / pytorch_geometric

implementaion about knowledge graph attention #2395

❓ Questions & Help