snap-stanford / deepsnap

Python library assists deep learning on graphs
https://snap.stanford.edu/deepsnap/
MIT License
548 stars 56 forks source link

Architecture of the aggregation in HeteroSAGEConv? #29

Open anniekmyatt opened 3 years ago

anniekmyatt commented 3 years ago

Hello! Not really an issue but I have a question about the implementation of the update step in hetero_gnn.py. What is the benefit of calculating the output via these lines:

aggr_out = self.lin_neigh(aggr_out)
node_feature_self = self.lin_self(node_feature_self)
aggr_out = torch.cat([aggr_out, node_feature_self], dim=-1)
aggr_out = self.lin_update(aggr_out)

so applying a linear layer to the aggregated neighbour features and another linear layer to features of the node itself, and afterwards applying another layer to the concatenation of the results? In terms of the weights matrix multiplications this represents:

I thought it would be simpler to use just

aggr_out = torch.cat([aggr_out, node_feature_self], dim=-1)
aggr_out = self.lin_update(aggr_out)

where self.lin_update is now initialised as self.lin_update = nn.Linear(self.in_channels_self + self.in_channels_neigh, self.out_channels) and we don't need the linear layers self.lin_neigh and self.lin_self anymore?

This represents something like

where CONCAT is the vector concatenation operator and the prime indicates that we now have a different dimension for W_y and b_y.

In terms of the number of parameters in the model it doesn't make a huge difference but by including these additional layers, you have a more complex optimisation surface that involves a product of weights matrices. Would this not make it a bit harder for the gradient descent algorithm to get to a good solution?

Thank you for any explanation you can provide for the benefits of the slightly more complex architecture implemented in deepsnap!

zechengz commented 3 years ago

Hi,

The idea of using separate linear layers for self and neighbor is mainly derived from the Relational GCN, which is briefly described in P13 of this slides. And adding another layer at the end may play the role of post-process layer, which is introduced in P52 of this slides. Usually using a post-process layer can be helpful, as shown in this paper.