pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.09k stars 3.63k forks source link

My loss function in Pytorch Geometric does not train #4933

Open tayssirmoussa66 opened 2 years ago

tayssirmoussa66 commented 2 years ago

🐛 Describe the bug

I am training a GCN model using pytorch geometric that calculate the attention weight betwenn each pair of nodes, but my loss function does not update during training. this is my model:

class GCN(torch.nn.Module):

    def __init__(self, feature_size, n_layers, embedding_size, edge_dim, n_heads):
        super().__init__()

        self.layers = torch.nn.ModuleList()

        # construct input layer
        self.layers.append(GraphConv(feature_size, embedding_size))
        # construct hidden layers
        for i in range(n_layers -1):
            self.layers.append(GraphConv(embedding_size, embedding_size))

        self.linear1 = Linear(embedding_size, embedding_size)

        self.att_layer = GATConv(embedding_size, embedding_size,heads=n_heads, edge_dim=edge_dim
                                    )

        self.Linear2 = Linear(embedding_size*n_heads, embedding_size)
        self.Linear3 = Linear(embedding_size*2, embedding_size)
        self.out_Layer= GATConv(embedding_size, embedding_size,   heads=n_heads, edge_dim=edge_dim
                                   )

    def forward(self, x, edge_weight, edge_attr, edge_index):
        #Local embeddings 
        for layer in self.layers:
            x = layer( x, edge_index, edge_weight)
            x = F.relu(x)

        #attention layer

        y = self.att_layer(x, edge_index, edge_attr)
        y = F.elu(self.Linear2(y))
        y = F.dropout(y, p=0.6, training=self.training)

        #Global embeddings
        Global = F.log_softmax(y, dim=1)
        concat_vector = torch.cat([x, Global], 1)
        concat_vector=self.Linear3(concat_vector)
        #calculating scores
        out, weights = self.out_Layer(concat_vector, edge_index, edge_attr, return_attention_weights=True)

        scores= list(weights)
        attention_weights = scores[1]

        attention_weights= torch.reshape(attention_weights, (-1,))

        return attention_weights

and this is the train function

def train_one_epoch( model,data, optimizer, loss_fn):
    running_loss = 0.0
    step = 0
    sum_acc=0
    NK=10

    for i in range(len(data)):

         # Use GPU
        data[i].to(device) 
        # Reset gradients

        optimizer.zero_grad() 
        # Passing the node features and the connection info

        output = model(data[i].x.float(), 
                                data[i].edge_weight.float(),
                                data[i].edge_attr.float(),
                                data[i].edge_index,   
                                ) 

        flat_label = torch.maximum(torch.tensor(0), data[i].y)
        # Calculating the loss and gradients

        loss = loss_fn(output, flat_label)

        loss.backward()  
        optimizer.step()  

        # Update tracking
        running_loss += loss.item()
        step += 1

    return running_loss/step

the loss remaining exactly the same each epoch. Please help me, what happens? Why does the model not train during training steps?

Environment

rusty1s commented 2 years ago

How is your loss_fn defined? I am not sure it might be a good idea to use the attention_weights as training loss.

THinnerichs commented 2 years ago

Do you set the weights to be trainable with e.g. model.train() prior to the training step?

tayssirmoussa66 commented 2 years ago

my model is used to define the scores of reactivity between each pair of atoms in a chemical reaction, my loss_fn(input, target) is defined like this: the target y is a vecteur contains 1 if the bond between two atoms has changed from reactant to product and 0 else. the input is the attention_weights calculated by the model

rusty1s commented 2 years ago

I think it might make more sense to define your own head for this, e.g.:

out = MLP(torch.cat([x[edge_index[0]], x[edge_index[1]]], dim=-1)
return softmax(out, edge_index[1])

and then use a multi-class loss rather than a multi-label loss.

tayssirmoussa66 commented 2 years ago

@rusty1s do you mean i don't need to use GATConv layer ?

rusty1s commented 2 years ago

Yes, if you are only interested in defining an edge score, using GATConv for this might be overkill.