tk-rusch / gradientgating

Gradient gating (ICLR 2023)
MIT License
51 stars 4 forks source link

Code Question #4

Open Hamsss opened 6 months ago

Hamsss commented 6 months ago

First of all, thank you for your paper and codes, it really helps me a lot. Actually, I have a question about your code. This is the code of your model.

class G2_GNN(nn.Module):
    def __init__(self, nfeat, nhid, nclass, nlayers, conv_type='GCN', p=2., drop_in=0, drop=0, use_gg_conv=True):
        super(G2_GNN, self).__init__()
        self.conv_type = conv_type
        self.enc = nn.Linear(nfeat, nhid)
        self.dec = nn.Linear(nhid, nclass)
        self.drop_in = drop_in
        self.drop = drop
        self.nlayers = nlayers
        if conv_type == 'GCN':
            self.conv = GCNConv(nhid, nhid)
            if use_gg_conv == True:
                self.conv_gg = GCNConv(nhid, nhid)
        elif conv_type == 'GAT':
            self.conv = GATConv(nhid,nhid,heads=4,concat=True)
            if use_gg_conv == True:
                self.conv_gg = GATConv(nhid,nhid,heads=4,concat=True)
        else:
            print('specified graph conv not implemented')

        if use_gg_conv == True:
            self.G2 = G2(self.conv_gg,p,conv_type,activation=nn.ReLU())
        else:
            self.G2 = G2(self.conv,p,conv_type,activation=nn.ReLU())

    def forward(self, data):
        X = data.x
        n_nodes = X.size(0)
        edge_index = data.edge_index
        X = F.dropout(X, self.drop_in, training=self.training)
        X = torch.relu(self.enc(X))

        for i in range(self.nlayers):
            if self.conv_type == 'GAT':
                X_ = F.elu(self.conv(X, edge_index)).view(n_nodes, -1, 4).mean(dim=-1)
            else:
                X_ = torch.relu(self.conv(X, edge_index))
            tau = self.G2(X, edge_index)
            X = (1 - tau) * X + tau * X_
        X = F.dropout(X, self.drop, training=self.training)

        return self.dec(X)

I thought n-layers was the number of layers. But when I looked at the code, I realized that it means that one layer is used as nlayers. I think this means that whether I insert the number of layer such as 16 or 32, the number of layer is always one. May I ask why you implement the code like this? or Did I misunderstand?

And I also want to ask, if I want to check the model performance, what is the order of the Model, do I just pile the layer G2 up?

tk-rusch commented 6 months ago

Thanks for reaching out. It is a multi-layer GNN, however, in our case we share the same parameters among the different layers. That's why we do the for-loop over the number of layers, but calling the same GNN each time. The reason for that is:

  1. it gives the same and sometimes better performance than using different weights for each layer
  2. it corresponds to a graph-dynamical system modeled by a differential equation (please look at our paper for that)

I don't understand your second question, i.e., about checking the model performance.

Hamsss commented 6 months ago

Thank you so much for your answer. I really appreciate it. I was only thinking of layers with different parameters.

And the second question mean is, I just want to run this code on the condition of different parameters among the different layers.

bjarkedc commented 4 months ago

@Hamsss Did you have any success in adapting it to non-shared parameters.

@tk-rusch Is this a requirement for the Gradient Gating framework to work? Sorry if this is a novice question.

Thanks you both in advance.

tk-rusch commented 4 months ago

No, it's absolutely not! You can extend it to using different parameters among different layers by simply:

self.convs = nn.ModuleList()
for i in range(nlayers):
  self.convs.append(GCNConv(nhid, nhid))

and then in forward():

for i in range(self.nlayers):
  X_ = torch.relu(self.convs[i](X, edge_index))

You can do the same for the GG layers.

bjarkedc commented 4 months ago

@tk-rusch Thank you for the swift reply and example.

bjarkedc commented 2 months ago

@tk-rusch Just to be clear, I can have multiple layers that do not share weights, and I can apply a gradient gate to each of these layers. Is that correctly understood? This does not break anything on the theoretical side, correct? Thank you in advance! Sorry for the spam.