pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.52k stars 3.69k forks source link

enzymes_topk_pool model is not learning #625

Closed sachinsharma9780 closed 2 years ago

sachinsharma9780 commented 5 years ago

❓ Questions & Help

Hi I am using enzymes_topk_pool(ETP) algorithm for Medical Image classification. I have created features out of Images and converted them into data format accepted by pytorchg data loader. But after that when I try to give these features to the ETP algo , model is not able to learn anything. Training and test loss doesn't change from 1st epoch until the end. Everything remains constant. More info: Its binary classification problem. Below i am attaching the small script so that u get an idea.

class Net(torch.nn.Module): def init(self): super(Net, self).init()

41 = number of features

    self.conv1 = GraphConv(dataset.num_node_features, 64)
    self.pool1 = TopKPooling(64, ratio=0.8)
    self.conv2 = GraphConv(64, 64)
    self.pool2 = TopKPooling(64, ratio=0.8)
    self.conv3 = GraphConv(64, 64)
    self.pool3 = TopKPooling(64, ratio=0.8)

    self.lin1 = torch.nn.Linear(128, 128)
    self.lin2 = torch.nn.Linear(128, 64)
    self.lin3 = torch.nn.Linear(64, 1)
    self.bn1 = torch.nn.BatchNorm1d(128)
    self.bn2 = torch.nn.BatchNorm1d(64)
    #self.act1 = torch.nn.ReLU()
    #self.act2 = torch.nn.ReLU()  

def forward(self, data):

    x, edge_index, batch = data.x, data.edge_index, data.batch
    #edge_index, _ = remove_self_loops(edge_index)
    #edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0))

    x = F.relu(self.conv1(x, edge_index))
    x, edge_index, _, batch, _= self.pool1(x, edge_index, None, batch)
    x1 = torch.cat([gmp(x, batch), gap(x, batch)], dim=1)

    x = F.relu(self.conv2(x, edge_index))
    x, edge_index, _, batch, _ = self.pool2(x, edge_index, None, batch)
    x2 = torch.cat([gmp(x, batch), gap(x, batch)], dim=1)

    x = F.relu(self.conv3(x, edge_index))
    x, edge_index, _, batch, _ = self.pool3(x, edge_index, None, batch)
    x3 = torch.cat([gmp(x, batch), gap(x, batch)], dim=1)

    x = x1 + x2 + x3

    x = F.relu(self.lin1(x))
    x = F.relu(self.lin2(x))
    #x = F.dropout(x, p=0.5, training=self.training)
    #x = torch.sigmoid(self.lin3(x)).squeeze(1)
    x = torch.sigmoid(self.lin3(x)).squeeze(1)
    #print('x', x.shape)
    #x = F.log_softmax(self.lin3(x), dim=-1)
    return x

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = Net().to(device) optimizer = torch.optim.Adam(model.parameters(), lr=0.001) scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, verbose=True)

crit = torch.nn.BCELoss() import pdb def train(epoch): model.train()

loss_all = 0
for data in train_loader:
    data = data.to(device)
    optimizer.zero_grad()
    output = model(data)
    #print('o/ps',output)

    #print(output)
    #print('len',output.shape)
    label = data.y.to(device).cuda()
    label = torch.tensor(label, dtype=torch.float).to(device)

    #print('lbls',label)
   # label = torch.tensor(label, dtype=torch.float)
    #print('lbl', label.shape)
    loss = crit(output, label)
    #print('loss',loss)
    #loss = crit(output, data.y)
    loss.backward(retain_graph=True)
    loss_all += data.num_graphs * loss.item()
    optimizer.step()
scheduler.step(loss_all)
return loss_all / len(train_data_list)

from sklearn.metrics import roc_auc_score def evaluate(loader): model.eval()

predictions = []
labels = []

with torch.no_grad():
    for data in loader:

        data = data.to(device)
        pred = model(data).detach().cpu().numpy()

        #print('pred ', pred)

        label = data.y.detach().cpu().numpy()

        #print('label ',label)
        predictions.append(pred)
        labels.append(label)

predictions = np.hstack(predictions)
#predictions = torch.cat(predictions)
#predictions = torch.tensor(predictions)
labels = np.hstack(labels)
#labels = torch.tensor(labels)
#labels = torch.cat(labels)

return roc_auc_score(labels, predictions)

for epoch in range(1, 201): loss = train(epoch) train_auc = evaluate(train_loader) test_auc = evaluate(test_loader)

train_acc = test(train_loader)

#test_acc = test(test_loader)
print('Epoch: {:03d}, Loss: {:.5f}, Train Auc: {:.5f}, Test AUC: {:.5f}'.
      format(epoch, loss, train_auc, test_auc))

Note: For feature extraction from Images I have used ur Master thesis code. I have just used Form_feature_extration file and adjacency.py file but not feature_selection and coarsening file. Are they also needed to create features? Because currently, I have 41 features for every node in the image.

Thanks in advance!

rusty1s commented 5 years ago

There are a lot of useful operators that take edge features into account, e.g., NNConv, SplineConv, GMMConv. In addition, it shouldn't be hard to implement one by yourself, e.g.:

def message(self, x_i, x_j, edge_attr):
    return self.lin(torch.cat([x_i, x_j, edge_attr], dim=-1)
sachinsharma9780 commented 5 years ago

ok, thnx for the suggestions.

sachinsharma9780 commented 5 years ago

Hi, I have created graphs and following is the info regarding one graph: Data(edge_attr=[261632], edge_index=[2, 261632], x=[512, 64], y=[1])

My question is: I am only able to perform training with batch_size=1, else it is giving CUDA out of memory error which is : Tried to allocate 3.99 GiB (GPU 0; 10.92 GiB total capacity; 4.51 GiB already allocated; 2.02 GiB free; 3.78 GiB cached) I dont know what is happening?

Total gpu mem is 11 gb.

sachinsharma9780 commented 5 years ago
  1. mask.sum().item() / mask.size(0) yields the split percentage.
  2. The order is different. We first apply GCN and then select the specific nodes using the masks for loss/metric computation. This is a semi-supervised learning scenario where we make use of the whole graph structure but only make use of the ground-truth of a small amount of nodes.

so this semi supervised algo has one drawback that after training if new data point comes and we want to make prediction on it then do we need to train the whole graph again with new data point? Is that correct?

rusty1s commented 5 years ago

Generally, yes! However, your model might be able to generalize to some extent without re-training.