Adding additional but completely useless GCN layers affects training results

WatsonLee commented 1 year ago

🐛 Describe the bug

Hello, we have encountered a problem when using GCNConv and HypergraphConv for graph representation learning. We want to use GCNConv and HypergraphConv layers to process graph1 and graph2, respectively. However, when we add the HypergraphConv layer, even if nothing is done, the original GCNConv-based model training results will be greatly affected.

Original

class GraphNN(nn.Module):
    def __init__(self, ntoken, ninp, dropout=0.1):
        super(GraphNN, self).__init__()
        self.embedding_heter = nn.Embedding(ntoken, ninp, padding_idx=0)
        self.ntoken = ntoken

        self.gnn1 = GCNConv(ninp, ninp * 2)
        self.gnn2 = GCNConv(ninp * 2, ninp)

        self.dropout = nn.Dropout(dropout)
        self.init_weights()

    def init_weights(self):
        init.xavier_normal_(self.embedding_heter.weight)

    def forward(self, heter_graph, hyper_graph, cas_embeddings=None, cas_weights = None):
        heter_graph_edge_index = heter_graph.edge_index.cuda()

        heter_graph_x_embeddings = self.gnn1(self.embedding_heter.weight, heter_graph_edge_index)
        heter_graph_x_embeddings = self.dropout(heter_graph_x_embeddings)
        heter_graph_output = self.gnn2(heter_graph_x_embeddings, heter_graph_edge_index)

        return heter_graph_output.cuda()

Modified

class GraphNN(nn.Module):
    def __init__(self, ntoken, ninp, dropout=0.1):
        super(GraphNN, self).__init__()
        self.embedding_heter = nn.Embedding(ntoken, ninp, padding_idx=0)
        self.embedding_hyper = nn.Embedding(ntoken, ninp, padding_idx=0)

        self.ntoken = ntoken

        self.gnn1 = GCNConv(ninp, ninp * 2)
        self.gnn2 = GCNConv(ninp * 2, ninp)

        self.gnn3 = HypergraphConv(ninp, ninp * 2)
        self.gnn4 = HypergraphConv(ninp * 2, ninp)

        self.dropout = nn.Dropout(dropout)
        self.init_weights()

    def init_weights(self):
        init.xavier_normal_(self.embedding_heter.weight)
        init.xavier_normal_(self.embedding_hyper.weight)

    def forward(self, heter_graph, hyper_graph):
        heter_graph_edge_index = heter_graph.edge_index.cuda()

        heter_graph_x_embeddings = self.gnn1(self.embedding_heter.weight, heter_graph_edge_index)
        heter_graph_x_embeddings = self.dropout(heter_graph_x_embeddings)
        heter_graph_output = self.gnn2(heter_graph_x_embeddings, heter_graph_edge_index)

        return heter_graph_output.cuda(), None

We guess it is not an problem caused by parameter initialization? We don't know how to fix it. Looking forward to your reply.

Environment

PyG version: 2.1.0
PyTorch version: 1.12.0
OS: win 10.0.19044
Python version: 3.10
CUDA/cuDNN version: Cuda 11.3 with cdDNN 8.0
How you installed PyTorch and PyG (conda, pip, source): Conda
Any other relevant information (e.g., version of torch-scatter):
This problem happens in another enviroments with CentOS 7.6, python 3.8.11, pytorch 1.8.1, pyg 2.0.1, cuda 10.1,cudnn 7.6.3

rusty1s commented 1 year ago

Mh, this would be weird - this certainly has to do something with weight initialization IMO. Given the same weights for embedding and GCNConv, the output should be the same. Can you test that?

Also: Did you fix the seed? What happens if you move the self.init_weights() call up directly after the GCNConv definitions (ensuring that Embedding layers are initialized the same)?

WatsonLee commented 1 year ago

Thank you for your suggestion. Yes, to ensure the reproducibility of our code, we fixed the seeds of random, torch, numpy and other modules at the beginning.

We try to move the self.init_weights() directly after the GCNConv definitions, the experimental result is different from those of Original and Modified.

We don't know how to implement your intent, `` Given the same weights for embedding and GCNConv'', can you give us some hints?

Thank you.

rusty1s commented 1 year ago

A simple test would be

torch.save(conv1, 'conv1.pt')
torch.save(conv2, 'conv2.pt')

in your single model, and then load them in your other model

self.conv1 = torch.load('conv1.pt')
self.conv2 = torch.load('conv2.pt')

You can confirm that this shares the same weights by running

print(self.conv1.lin.weight)

on both models.

WatsonLee commented 1 year ago

Hello, we have tried your suggestion and their weights are not the same. Moreover, when we increase the training epoch, they can indeed get similar results, although the convergence speed will be different.

Thanks for your suggestion.

pyg-team / pytorch_geometric