Open junkangwu opened 3 years ago
You can filter your index
via edge_type
holding the relation type information, e.g.:
for i in range(num_relations):
mask = edge_type == i
self.propagate(edge_index[:, mask], x=x, relation=i)
def message(self, x_j, index, relation):
alpha = ... # Compute attention scores
alpha = softmax(alpha, index)
...
Thanks a lot, I understand your idea now!
@rusty1s By the way, I find it difficult to compute alpha
and beta
at the same time. If I filter index
via edge_type
, that is I focus on the subgraph about relation_i
. Then beta
attention could be implemented while alpha
is still hard to address.
For example, if I mask edge_type i
, then in the self.propagate
, the edge_index
is all about r_i
. So as for attention alpha
, I couldn't know the other relations which are the neighbors about entity u
.
The desired final result shows below:
For example:
attention_{e_1,e_2} = alpha_{e_1,r_1} * beta_{r_1,e_2}
Looking forward to your help again!
I see. You can also do the filtering directly inside message
:
self.propagate(edge_index, x=x, edge_type=edge_type)
def message(self, x_i, x_j, index, edge_type):
# Compute alpha based on `index`
for i in range(self.num_relations):
cur_index = index[edge_type == i]
# Compute beta based on `cur_index` for relational type `i`
cool, In #Compute beta based on cur_index
for relational type i
, by using torch.where
to fill the values into the specific indices in order to form a complete matrix. Do you agree?
By the way please, it seems that in the procedure of # Compute alpha based on index
. It is very easy to appear the same entity, the same edge, attention double calculation. For example, in the case above figure:
alpha_{e_1, r_1}=2 * value_{e_1, r_1} / (2 * value_{e_1, r_1} + 1 * value_{e_1, r_2} + 3 * value_{e_1, r_3))
But desired result is :
alpha_{e_1, r_1} = value_{e_1, r_1} / (value_{e_1, r_1} + value_{e_1, r_2} + value_{e_1, r_3))
I'm sorry to trouble you lots of times as a green hand, Thanks a lot in advance. @rusty1s
Thanks a lot, I will have a try. It seems that indeed a bit tricky to compute.
Thank a lot for your patience. I have tried a trick and address it. The main idea is to control the index in order to using softmax function. We just need to prepare some index in advance which will not affect the training speed. And I made a custom adjustment to your softmax function:
from torch_scatter import scatter
def softmax(src, index, N, index2=None, ptr=None):
if ptr is not None:
src_max = gather_csr(segment_csr(src, ptr, reduce='max'), ptr)
out = (src - src_max).exp()
out_sum = gather_csr(segment_csr(out, ptr, reduce='sum'), ptr)
elif index2 is not None: # as for the repeated (head, relation) which are thrown out, we need to selece them out
src_max = scatter(src, index, dim_size=N, reduce='max')[index2]
out = (src - src_max).exp()
out_sum = scatter(out, index, dim_size=N, reduce='sum')[index2]
elif index is not None:
src_max = scatter(src, index, dim_size=N, reduce='max')[index]
out = (src - src_max).exp()
out_sum = scatter(out, index, dim_size=N, reduce='sum')[index]
else:
raise NotImplementedError
return out / (out_sum + 1e-16)
The detailed attention computing procedure as follows:
import torch
import torch. nn as nn
edge_index = torch.LongTensor([[0, 0, 0, 0, 0, 0, 2],
[1, 4, 2, 3, 5, 6, 3]])
# edge_index = torch.LongTensor([[0, 1, 1, 2, 2, 1, 0],
# [1, 4, 2, 3, 5, 6, 3]])
edge_type = torch.LongTensor([0, 0, 1, 2, 2, 2, 3])
r = torch.randn(4, 4) # embedding for relation_type
h = torch.randn(7, 4) # embedding for entity type
weight = torch.randn(8, 1)
dict_rt = {}
index_rt = []
cnt = 0
# prepare for beta
for i in range(edge_index.size(1)):
if (edge_index[0, i].item(), edge_type[i].item()) not in dict_rt:
index_rt.append(cnt)
dict_rt[(edge_index[0, i].item(), edge_type[i].item())] = cnt
cnt += 1
else:
index_rt.append(dict_rt[(edge_index[0, i].item(), edge_type[i].item())]) # index should be same under the same (head, relation) in order to compute the sum attention under the (head, relation)
beta = torch.cat([h[edge_index[1]], r[edge_type]], dim=1) @ weight # attention between tail and relation [num_edge 1]
beta = nn.LeakyReLU(0.2)(beta)
print('beta before')
print(beta)
beta = softmax(beta, torch.LongTensor(index_rt), 7)
print('beta after')
print(beta)
# prepare for alpha
index_hr = []
dict_hr = {}
cnt = 0
for i in range(edge_index.size(1)):
if (edge_index[0, i].item(), edge_type[i].item()) not in dict_hr:
index_hr.append(edge_index[0, i]) # index should be same under the same (head) in order to compute the sum attention under the (head)
dict_hr[(edge_index[0, i].item(), edge_type[i].item())] = cnt
cnt += 1
else:
index_hr.append(edge_index.size(1)-1) # all the repeated (head, relation) should be thrown out avoid of double counting by giving a maxinmum index value
alpha = torch.cat([h[edge_index[0]], r[edge_type]], dim=1) @ weight # attention between head and relation [num_edge 1]
alpha = nn.LeakyReLU(0.2)(alpha)
print('alpha before')
print(alpha)
alpha = softmax(alpha, torch.LongTensor(index_hr), 7, edge_index[0])
print('alpha after')
print(alpha)
Focusing on the example in the figure, the final result is :
beta before
tensor([[-0.4961],
[-0.3258],
[ 2.4938],
[ 3.6552],
[ 3.0433],
[ 2.1643],
[ 3.5956]])
beta after
tensor([[0.4575],
[0.5425],
[1.0000],
[0.5658],
[0.3068],
[0.1274],
[1.0000]])
alpha before
tensor([[ 0.0215],
[ 0.0215],
[-0.3617],
[-0.0067],
[-0.0067],
[-0.0067],
[ 4.2091]])
alpha after
tensor([[0.3768],
[0.3768],
[0.2569],
[0.3663],
[0.3663],
[0.3663],
[1.0000]])
Overall, Thanks a lot for your patience with sharing!
Thank a lot for your patience. I have tried a trick and address it. The main idea is to control the index in order to using softmax function. We just need to prepare some index in advance which will not affect the training speed. And I made a custom adjustment to your softmax function:
from torch_scatter import scatter def softmax(src, index, N, index2=None, ptr=None): if ptr is not None: src_max = gather_csr(segment_csr(src, ptr, reduce='max'), ptr) out = (src - src_max).exp() out_sum = gather_csr(segment_csr(out, ptr, reduce='sum'), ptr) elif index2 is not None: # as for the repeated (head, relation) which are thrown out, we need to selece them out src_max = scatter(src, index, dim_size=N, reduce='max')[index2] out = (src - src_max).exp() out_sum = scatter(out, index, dim_size=N, reduce='sum')[index2] elif index is not None: src_max = scatter(src, index, dim_size=N, reduce='max')[index] out = (src - src_max).exp() out_sum = scatter(out, index, dim_size=N, reduce='sum')[index] else: raise NotImplementedError return out / (out_sum + 1e-16)
The detailed attention computing procedure as follows:
import torch import torch. nn as nn edge_index = torch.LongTensor([[0, 0, 0, 0, 0, 0, 2], [1, 4, 2, 3, 5, 6, 3]]) # edge_index = torch.LongTensor([[0, 1, 1, 2, 2, 1, 0], # [1, 4, 2, 3, 5, 6, 3]]) edge_type = torch.LongTensor([0, 0, 1, 2, 2, 2, 3]) r = torch.randn(4, 4) # embedding for relation_type h = torch.randn(7, 4) # embedding for entity type weight = torch.randn(8, 1) dict_rt = {} index_rt = [] cnt = 0 # prepare for beta for i in range(edge_index.size(1)): if (edge_index[0, i].item(), edge_type[i].item()) not in dict_rt: index_rt.append(cnt) dict_rt[(edge_index[0, i].item(), edge_type[i].item())] = cnt cnt += 1 else: index_rt.append(dict_rt[(edge_index[0, i].item(), edge_type[i].item())]) # index should be same under the same (head, relation) in order to compute the sum attention under the (head, relation) beta = torch.cat([h[edge_index[1]], r[edge_type]], dim=1) @ weight # attention between tail and relation [num_edge 1] beta = nn.LeakyReLU(0.2)(beta) print('beta before') print(beta) beta = softmax(beta, torch.LongTensor(index_rt), 7) print('beta after') print(beta) # prepare for alpha index_hr = [] dict_hr = {} cnt = 0 for i in range(edge_index.size(1)): if (edge_index[0, i].item(), edge_type[i].item()) not in dict_hr: index_hr.append(edge_index[0, i]) # index should be same under the same (head) in order to compute the sum attention under the (head) dict_hr[(edge_index[0, i].item(), edge_type[i].item())] = cnt cnt += 1 else: index_hr.append(edge_index.size(1)-1) # all the repeated (head, relation) should be thrown out avoid of double counting by giving a maxinmum index value alpha = torch.cat([h[edge_index[0]], r[edge_type]], dim=1) @ weight # attention between head and relation [num_edge 1] alpha = nn.LeakyReLU(0.2)(alpha) print('alpha before') print(alpha) alpha = softmax(alpha, torch.LongTensor(index_hr), 7, edge_index[0]) print('alpha after') print(alpha)
Focusing on the example in the figure, the final result is :
beta before tensor([[-0.4961], [-0.3258], [ 2.4938], [ 3.6552], [ 3.0433], [ 2.1643], [ 3.5956]]) beta after tensor([[0.4575], [0.5425], [1.0000], [0.5658], [0.3068], [0.1274], [1.0000]]) alpha before tensor([[ 0.0215], [ 0.0215], [-0.3617], [-0.0067], [-0.0067], [-0.0067], [ 4.2091]]) alpha after tensor([[0.3768], [0.3768], [0.2569], [0.3663], [0.3663], [0.3663], [1.0000]])
Overall, Thanks a lot for your patience with sharing!
Hello, I want to reproduce RGHAT. Do you reproduce the result successfully? I think that computing alpha and beta at the same time based on PyG is still hard.
Thank a lot for your patience. I have tried a trick and address it. The main idea is to control the index in order to using softmax function. We just need to prepare some index in advance which will not affect the training speed. And I made a custom adjustment to your softmax function:
from torch_scatter import scatter def softmax(src, index, N, index2=None, ptr=None): if ptr is not None: src_max = gather_csr(segment_csr(src, ptr, reduce='max'), ptr) out = (src - src_max).exp() out_sum = gather_csr(segment_csr(out, ptr, reduce='sum'), ptr) elif index2 is not None: # as for the repeated (head, relation) which are thrown out, we need to selece them out src_max = scatter(src, index, dim_size=N, reduce='max')[index2] out = (src - src_max).exp() out_sum = scatter(out, index, dim_size=N, reduce='sum')[index2] elif index is not None: src_max = scatter(src, index, dim_size=N, reduce='max')[index] out = (src - src_max).exp() out_sum = scatter(out, index, dim_size=N, reduce='sum')[index] else: raise NotImplementedError return out / (out_sum + 1e-16)
The detailed attention computing procedure as follows:
import torch import torch. nn as nn edge_index = torch.LongTensor([[0, 0, 0, 0, 0, 0, 2], [1, 4, 2, 3, 5, 6, 3]]) # edge_index = torch.LongTensor([[0, 1, 1, 2, 2, 1, 0], # [1, 4, 2, 3, 5, 6, 3]]) edge_type = torch.LongTensor([0, 0, 1, 2, 2, 2, 3]) r = torch.randn(4, 4) # embedding for relation_type h = torch.randn(7, 4) # embedding for entity type weight = torch.randn(8, 1) dict_rt = {} index_rt = [] cnt = 0 # prepare for beta for i in range(edge_index.size(1)): if (edge_index[0, i].item(), edge_type[i].item()) not in dict_rt: index_rt.append(cnt) dict_rt[(edge_index[0, i].item(), edge_type[i].item())] = cnt cnt += 1 else: index_rt.append(dict_rt[(edge_index[0, i].item(), edge_type[i].item())]) # index should be same under the same (head, relation) in order to compute the sum attention under the (head, relation) beta = torch.cat([h[edge_index[1]], r[edge_type]], dim=1) @ weight # attention between tail and relation [num_edge 1] beta = nn.LeakyReLU(0.2)(beta) print('beta before') print(beta) beta = softmax(beta, torch.LongTensor(index_rt), 7) print('beta after') print(beta) # prepare for alpha index_hr = [] dict_hr = {} cnt = 0 for i in range(edge_index.size(1)): if (edge_index[0, i].item(), edge_type[i].item()) not in dict_hr: index_hr.append(edge_index[0, i]) # index should be same under the same (head) in order to compute the sum attention under the (head) dict_hr[(edge_index[0, i].item(), edge_type[i].item())] = cnt cnt += 1 else: index_hr.append(edge_index.size(1)-1) # all the repeated (head, relation) should be thrown out avoid of double counting by giving a maxinmum index value alpha = torch.cat([h[edge_index[0]], r[edge_type]], dim=1) @ weight # attention between head and relation [num_edge 1] alpha = nn.LeakyReLU(0.2)(alpha) print('alpha before') print(alpha) alpha = softmax(alpha, torch.LongTensor(index_hr), 7, edge_index[0]) print('alpha after') print(alpha)
Focusing on the example in the figure, the final result is :
beta before tensor([[-0.4961], [-0.3258], [ 2.4938], [ 3.6552], [ 3.0433], [ 2.1643], [ 3.5956]]) beta after tensor([[0.4575], [0.5425], [1.0000], [0.5658], [0.3068], [0.1274], [1.0000]]) alpha before tensor([[ 0.0215], [ 0.0215], [-0.3617], [-0.0067], [-0.0067], [-0.0067], [ 4.2091]]) alpha after tensor([[0.3768], [0.3768], [0.2569], [0.3663], [0.3663], [0.3663], [1.0000]])
Overall, Thanks a lot for your patience with sharing!
Hello, I want to reproduce RGHAT. Do you reproduce the result successfully? I think that computing alpha and beta at the same time based on PyG is still hard.
Sorry to reply so late. However, I failed to reproduce it...
From now on, we recommend using our discussion forum (https://github.com/rusty1s/pytorch_geometric/discussions) for general questions.
❓ Questions & Help
Hi, I'd like to implement the function as follows: That is, I need to calculate the attention
alpha
between the head entity and relations, and need to calculate the attentionbelta
between relation and tail entity. So, How can I usepyg
to implement attention score in terms of a part of neighbors, not all neighbors? Note: I can implement it in the numerator, but it's not clear how do I compute the denominator The above is the explanation of the Dimension of the tensor in the softmax function where num_edge is the number of all edge_index. Looking forward to your help! Thanks a lot!