Closed SZiesche closed 1 year ago
Please feel free to open a PR for it. Happy to take any contributions :)
Am I missing something? When I try to push a new branch, I get
remote: Permission to pyg-team/pytorch_geometric.git denied to SZiesche.
fatal: unable to access 'https://github.com/pyg-team/pytorch_geometric.git/': The requested URL returned error: 403
You can fork the repository, make your desired changes, and push them to your forked repository. Afterward, you can open a PR with your changes, and I would be happy to review your contributions.
FYI: https://github.com/pyg-team/pytorch_geometric/blob/master/.github/CONTRIBUTING.md
I have a question about this issue and its resolution (if I should open a new issue instead, please let me know and I will be happy to do so). The new docstring in GATConv and GATv2Conv states that self-loops should use the parameter \Theta_s
when computing the forward message passing:
\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}_{s}\mathbf{x}_{i} +
\sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}_{t}\mathbf{x}_{j},
However, as far as I can tell, the implementations use \Theta_t
for all edges, including self-loops, so it seems like the behavior is given by:
\mathbf{x}^{\prime}_i = \alpha_{i,i}\mathbf{\Theta}_{t}\mathbf{x}_{i} +
\sum_{j \in \mathcal{N}(i)} \alpha_{i,j}\mathbf{\Theta}_{t}\mathbf{x}_{j},
Am I misunderstanding something? If not, I'm not sure which of these two is the intended behavior, so I don't know what the resolution should be.
I do find the updated documentation for the attention coefficients to be much clearer, however, so thank you for this change!
I'm not sure, what exactly you are referring to, but I can say that a graph with self-loops is not bipartite and hence, \Theta_s = \Theta_t
in the case of GATconv
.
Another thing is, that if a node has a self-loop it is in its own neighbor set and hence it would also be considered in the sum.
Does that resolve the confusion?
Thank you, that does resolve my confusion regarding GATConv
and the behavior of self-loops, although I am still confused about GATv2Conv
. Is \Theta_s = \Theta_t
still necessary in the case of a non-bipartite graph in GATv2Conv
?
No it is not necessary. GATv2Conv
has a flag share_weights
, if you set this to True
, then the same Linear layer will be used. By default it is False
even if your graph is not bipartite.
In that case, I think my confusion is due to the \alpha_{i,i}
. When I have self-loops, each node is included in its own neighbor sum, as you said, but without self-loops it doesn't seem that anything is calculated for the first term?
For example, if I set the attention parameter a = [0.0]
, I would expect the attention weights are given by softmax(0,...,0)
and so \alpha_{i,i} = \alpha_{i,j}
for each j
, correct? But when I try the following example, I get \alpha_{1,0} = 1.0
, when I expect \alpha_{1,0} = \alpha_{1,1} = 0.5
, so I am still confused:
from collections import OrderedDict
from torch import Tensor
from torch_geometric.nn.conv import GATv2Conv
m = GATv2Conv(1, 1, bias=False, add_self_loops=False)
m.load_state_dict(OrderedDict([('att', Tensor([[[0.0]]])),
('lin_l.weight', Tensor([[0.0]])),
('lin_r.weight', Tensor([[1.0]]))
]))
m(Tensor([[1.0],[0.0]]), Tensor([[0],[1]]).long(), return_attention_weights=True)
Again, my apologies if there is a better place for this discussion.
📚 Describe the documentation issue
In the documentation there are several subtleties hidden, that make it hard to understand, what is really going on without looking at the actual implementation. These are:
share_weights
is by defaultFalse
however the formulas indicate, that it should beTrue
.Suggest a potential alternative/fix
I suggest to explicitly name the different
\Theta
s used in the different layers by an indexs
andt
(like it is already done in the docs when talking about input sizes). Do the same fora
in theGATconv
layer. Finally remove the unclear||
operator and just write out the few additions.I could create a PR if you give me some rights to push a branch. Otherwise you could replace the class docstrings by
gat_conv.py
and
gatv2_conv.py