pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
20.72k stars 3.59k forks source link

enzymes_topk_pool model is not learning #625

Closed sachinsharma9780 closed 1 year ago

sachinsharma9780 commented 4 years ago

❓ Questions & Help

Hi I am using enzymes_topk_pool(ETP) algorithm for Medical Image classification. I have created features out of Images and converted them into data format accepted by pytorchg data loader. But after that when I try to give these features to the ETP algo , model is not able to learn anything. Training and test loss doesn't change from 1st epoch until the end. Everything remains constant. More info: Its binary classification problem. Below i am attaching the small script so that u get an idea.

class Net(torch.nn.Module): def init(self): super(Net, self).init()

41 = number of features

    self.conv1 = GraphConv(dataset.num_node_features, 64)
    self.pool1 = TopKPooling(64, ratio=0.8)
    self.conv2 = GraphConv(64, 64)
    self.pool2 = TopKPooling(64, ratio=0.8)
    self.conv3 = GraphConv(64, 64)
    self.pool3 = TopKPooling(64, ratio=0.8)

    self.lin1 = torch.nn.Linear(128, 128)
    self.lin2 = torch.nn.Linear(128, 64)
    self.lin3 = torch.nn.Linear(64, 1)
    self.bn1 = torch.nn.BatchNorm1d(128)
    self.bn2 = torch.nn.BatchNorm1d(64)
    #self.act1 = torch.nn.ReLU()
    #self.act2 = torch.nn.ReLU()  

def forward(self, data):

    x, edge_index, batch = data.x, data.edge_index, data.batch
    #edge_index, _ = remove_self_loops(edge_index)
    #edge_index, _ = add_self_loops(edge_index, num_nodes=x.size(0))

    x = F.relu(self.conv1(x, edge_index))
    x, edge_index, _, batch, _= self.pool1(x, edge_index, None, batch)
    x1 = torch.cat([gmp(x, batch), gap(x, batch)], dim=1)

    x = F.relu(self.conv2(x, edge_index))
    x, edge_index, _, batch, _ = self.pool2(x, edge_index, None, batch)
    x2 = torch.cat([gmp(x, batch), gap(x, batch)], dim=1)

    x = F.relu(self.conv3(x, edge_index))
    x, edge_index, _, batch, _ = self.pool3(x, edge_index, None, batch)
    x3 = torch.cat([gmp(x, batch), gap(x, batch)], dim=1)

    x = x1 + x2 + x3

    x = F.relu(self.lin1(x))
    x = F.relu(self.lin2(x))
    #x = F.dropout(x, p=0.5, training=self.training)
    #x = torch.sigmoid(self.lin3(x)).squeeze(1)
    x = torch.sigmoid(self.lin3(x)).squeeze(1)
    #print('x', x.shape)
    #x = F.log_softmax(self.lin3(x), dim=-1)
    return x

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = Net().to(device) optimizer = torch.optim.Adam(model.parameters(), lr=0.001) scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, verbose=True)

crit = torch.nn.BCELoss() import pdb def train(epoch): model.train()

loss_all = 0
for data in train_loader:
    data = data.to(device)
    optimizer.zero_grad()
    output = model(data)
    #print('o/ps',output)

    #print(output)
    #print('len',output.shape)
    label = data.y.to(device).cuda()
    label = torch.tensor(label, dtype=torch.float).to(device)

    #print('lbls',label)
   # label = torch.tensor(label, dtype=torch.float)
    #print('lbl', label.shape)
    loss = crit(output, label)
    #print('loss',loss)
    #loss = crit(output, data.y)
    loss.backward(retain_graph=True)
    loss_all += data.num_graphs * loss.item()
    optimizer.step()
scheduler.step(loss_all)
return loss_all / len(train_data_list)

from sklearn.metrics import roc_auc_score def evaluate(loader): model.eval()

predictions = []
labels = []

with torch.no_grad():
    for data in loader:

        data = data.to(device)
        pred = model(data).detach().cpu().numpy()

        #print('pred ', pred)

        label = data.y.detach().cpu().numpy()

        #print('label ',label)
        predictions.append(pred)
        labels.append(label)

predictions = np.hstack(predictions)
#predictions = torch.cat(predictions)
#predictions = torch.tensor(predictions)
labels = np.hstack(labels)
#labels = torch.tensor(labels)
#labels = torch.cat(labels)

return roc_auc_score(labels, predictions)

for epoch in range(1, 201): loss = train(epoch) train_auc = evaluate(train_loader) test_auc = evaluate(test_loader)

train_acc = test(train_loader)

#test_acc = test(test_loader)
print('Epoch: {:03d}, Loss: {:.5f}, Train Auc: {:.5f}, Test AUC: {:.5f}'.
      format(epoch, loss, train_auc, test_auc))

Note: For feature extraction from Images I have used ur Master thesis code. I have just used Form_feature_extration file and adjacency.py file but not feature_selection and coarsening file. Are they also needed to create features? Because currently, I have 41 features for every node in the image.

Thanks in advance!

rusty1s commented 4 years ago

Hi, interesting approach, but it is quite hard for me to evaluate your model. Personally, I would start with a simpler baseline and only add more sophisticated models if it later turns out to improve performance. If your model does not learn anything, it is likely that

  1. your data may be corrupted. Have you visualized it? Do your features are relevant for your task? You can evaluate this using a simple MLP, it should learn at least something. Is your edge connectivity correct?
  2. Your model might not be the best choice for your task. How do you encode relative spatial information of nodes in your model? You should look into using SplineCNN or NNConv to model that.
  3. Your hyperparameters might be wrong or you face numerical instabilities. The same accuracy across a number of epochs might well be a sign of NaN issues.
sachinsharma9780 commented 4 years ago

Hmmm.

  1. corrupted? I have extracted features from images like momentum, mean, max, etc for every segment in image. How to visualize it, u mean common data visualization techniques. Evaluation u mean I should give these extracted features to MLP and then see the result. For edge_connectivity or to get the edge_index of each segmented image i have used the following code with connectivity 8: def segmentation_adjacency(segmentation, connectivity=4): """Generate an adjacency matrix out of a given segmentation."""

    assert connectivity == 4 or connectivity == 8

    Get centroids.

    idx = np.indices(segmentation.shape) ys = npg.aggregate(segmentation.flatten(), idx[0].flatten(), func='mean') xs = npg.aggregate(segmentation.flatten(), idx[1].flatten(), func='mean') ys = np.reshape(ys, (-1, 1)) xs = np.reshape(xs, (-1, 1)) points = np.concatenate((ys, xs), axis=1)

    Get mass.

    nums, mass = np.unique(segmentation, return_counts=True) n = nums.shape[0]

    Get adjacency (https://goo.gl/y1xFMq).

    tmp = np.zeros((n, n), np.bool)

    Get vertically adjacency.

    a, b = segmentation[:-1, :], segmentation[1:, :] tmp[a[a != b], b[a != b]] = True

    Get horizontally adjacency.

    a, b = segmentation[:, :-1], segmentation[:, 1:] tmp[a[a != b], b[a != b]] = True

    Get diagonal adjacency.

    if connectivity == 8: a, b = segmentation[:-1, :-1], segmentation[1:, 1:] tmp[a[a != b], b[a != b]] = True

    a, b = segmentation[:-1, 1:], segmentation[1:, :-1]
    tmp[a[a != b], b[a != b]] = True

    result = tmp | tmp.T result = result.astype(np.uint8) adj = sp.coo_matrix(result)

    return adj

2) This might be the case but I have used Higher-order GCN which I think kind of general model for graph inputs. But don't know exactly. Spatial info of node u mean edge_index, which I have calculated with above code or edge_attr? If edge_attr then i guess it was not needed by this algo so i did not calculate it also I guess it is not mandatory to give edge_attr info to model.

3)what are face numerical instabilities. Are they compulsory to generate?

Below i am attaching some results of epoch: Screenshot from 2019-08-15 19-02-14

rusty1s commented 4 years ago
  1. Corrupted may be the wrong word, but I would suggest you still check that your pre-processing pipeline is correct. You can, e.g., map nodes to the mean position of their segment and visualize your resulting graph onto the image.
  2. I think it is absolutely mandatory to give spatial information to the model, otherwise the model cannot really find significant patterns in your data. This is normally done by encoding the relative spatial Cartesian coordinates as edge features, and use a model which can make use of multi-dimensional edge features.
  3. I believe your model may be throwing NaNs at one point, so weights do not get updated. You can verify this using the torch.autograd.detect_anomaly API.
sachinsharma9780 commented 4 years ago
  1. ok, i will check the preprocessing pipeline with the techniques mentioned by you
  2. ok, I will try to encode spatial info as edge features. So which one is the good model to interprets these edge features and node features, is it NNConv or any other? Also can u give me the code to get the spatial coordinates or at least some reference code on how to generate these.
  3. I will check for nan values.
  4. one short question , so currently I have generated 41 node features per image using mean, max, moments etc. Can it be the case that these so many features are creating problems, so Do I need to perform some PCA or feature selection algos to reduce the dimension. I am saying this because I have seen ur MNIST graph dataset there u only created one node feature and two edge_features.

Thank You! :)

rusty1s commented 4 years ago
  1. You can check the mnist examples in the examples/ directory. I would first test it without hierarchical pooling. You can add spatial coordinates via the Cartesian transform.
  2. The mnist graph dataset is not related to the master thesis, it is provided by the MoNet paper. It contains one feature (the color) and 2 edge features (the spatial coordinates between two points). In general, a PCA is not needed for deep learning architectures.
sachinsharma9780 commented 4 years ago

ok thanks! Since data.pos is also needed by NNConv model. I am having difficulty in understanding data.pos in graph. In the documentation it says data.pos: Node position matrix with shape [num_nodes, num_dimensions]. But what does 2nd dimension( "num_dimenstions" ) means here , as far as i understood dimensions in graph are the number of node features but herer it seems different could u please enlighten me with this. Also how can I generate these from the graph, is there any predefined function like cartesian transform.

Below I am attaching the result of what u said to draw a graph on img by taking a mean of segments. To me it seems fine. What u say?\ Note: Img below is a retina of eye. image_0

rusty1s commented 4 years ago

data.pos denotes the position of nodes in Euclidean space, e.g., for processing point clouds. The position differs from the input features since you generally do not want to input absolute coordinates into your model. In your case, data.pos should be a [N,2] tensor which holds for each node the mean coordinate of its segment. You already do this in the picture above.

sachinsharma9780 commented 4 years ago

Ahh! I get it. So thank you for the quick responses. I will analyze all the points said by you and will try to use the NNconv algo for my input data. I am keeping this issue open for my further questions! Thank You for ur help!

sachinsharma9780 commented 4 years ago

Hi,

  1. I did the changes suggested by you like adding spatial information and so on . Now the loss and accuracy are changing, But it is giving very bad results like per-class accuracy: tensor([0.3412, 0.6336])(Binary class problem(0, 1)). Loss comes down to 0.70 but test_acc still remains bad(49%).

  2. Could it be the case that my features are noisy or not relevant for differentiating 2 classes of my Medical images dataset? Because these features I extracted using your Master thesis code in which form_features_extraction.py file extracts 41 features from each segment in image.

  3. Could you please give me some reference on how to extract features efficiently from images so that Graph neural network(NN_conv) can find meaningful patterns from these features.

Thanks!

rusty1s commented 4 years ago

2 and 3: I cannot say that. You are free to try any other features which may help your model. Those are just the ones I tried in my thesis, although for MNIST the color feature is already sufficient.

Have you tried processing your data using traditional CNNs like ResNet? How does those models perform? In addition, this paper may be of interest to you.

sachinsharma9780 commented 4 years ago

Hmm.. ok. No, I haven't tried using ResNet on my dataset. How this will help me? Also, I guess resnet is not suitable for Medical Images classification. Thanx for the paper I will read it, any reference to the implementation of this paper.

rusty1s commented 4 years ago

Well, the superpixel approach is just another technique to process images, so i do not see how you cannot apply traditional CNNs for your task. In the end, accuracy should be equal or even better. Curious why you think this is not suitable for medical images?

I do not think there is an open-source reference implementation, but you can contact the author and ask for it.

bknyaz commented 4 years ago

@rusty1s Thanks for pointing to my paper. I cannot release full code due to restrictions where I did this work, but I'll be happy to clarify details. @sachinsharma9780 I also have different pieces in my github: extraction of superpixels here and an example of learning multigraphs here. From these pieces and from the formulas in our paper it should be possible to build a similar model for your task.

But, Matthias' thesis is so awesome, so I would just use his code. You can then try to add hierarchical and other relationships from my paper to improve results.

Matthias, you should publish your thesis in English as a journal paper or at lease a blog post. I should have cited it in my paper, but I wasn't aware of your work at that moment. Sorry about that.

I agree with Matthias that @sachinsharma9780 should first try CNN. People use ImageNet pretrained models for medical imaging even though ImageNet is very different, and it usually works great for mysterious reasons.

Sorry for my off topic comments :) If you need further help, please contact me.

sachinsharma9780 commented 4 years ago

ok, I was thinking with respect to fine tuning of resnet model for medical imgs since it is trained on imageNet. But without fine tuning we can use it.

sachinsharma9780 commented 4 years ago

@rusty1s Thanks for pointing to my paper. I cannot release full code due to restrictions where I did this work, but I'll be happy to clarify details. @sachinsharma9780 I also have different pieces in my github: extraction of superpixels here and an example of learning multigraphs here. From these pieces and from the formulas in our paper it should be possible to build a similar model for your task.

But, Matthias' thesis is so awesome, so I would just use his code. You can then try to add hierarchical and other relationships from my paper to improve results.

Matthias, you should publish your thesis in English as a journal paper or at lease a blog post. I should have cited it in my paper, but I wasn't aware of your work at that moment. Sorry about that.

I agree with Matthias that @sachinsharma9780 should first try CNN. People use ImageNet pretrained models for medical imaging even though ImageNet is very different, and it usually works great for mysterious reasons.

Sorry for my off topic comments :) If you need further help, please contact me.

Thanks for the reply. I'll give a look at the information provided by you. will contact you if have any doubts.

rusty1s commented 4 years ago

Thank you for your reply @bknyaz. It was a pleasure to read your paper. Actually, my master thesis was the methodical foundation of our SplineCNN paper, and we already applied it on superpixels in our MNIST experiment.

However, I do believe that a classical CNN is much more suited for those tasks (and is sadly way faster although it has a lot more data to consume), especially with the power of pre-training on ImageNet. Although ImageNet is quite different as you said, using the pre-trained model weights as initialization is, IMO, understandably more powerful than just using random initialization (even when applied to very different tasks), because the model has already learned so much about general vision. It can then proceed to reuse its knowledge for more specific tasks.

sachinsharma9780 commented 4 years ago

One thing I forget to tell is that my dataset is really small, like it has only 580 images. So I am performing experiments on that. Can it also be the reason that I am not getting good results.

rusty1s commented 4 years ago

Impossible to tell, you need baselines like MLPs and CNNs to judge the performance of your model.

sachinsharma9780 commented 4 years ago

ok, cool. I will create a baseline and then compare the performances of both models CNN and Graph NNConv.

sachinsharma9780 commented 4 years ago

Hi, I implemented NNConv on my binary dataset. Now loss is decreasing but training accuracy fluctuates around 50% and with 100s of epoch model's training accuracy is not increasing, it stables around 50%. I dont know what is happening, I changed learning rates dynamically, increased number of neurons in fully connected layers but still having the same result. Below is the modified NN conv which i am using:

def normalized_cut_2d(edge_index, pos): row, col = edge_index edge_attr = torch.norm(pos[row] - pos[col], p=2, dim=1) return normalized_cut(edge_index, edge_attr, num_nodes=pos.size(0))

class Net(nn.Module): def init(self): super(Net, self).init()

deals with edges

    # conv1 wts = d.num_features(in_channels)*32(output_channels), 32 filter size
    # nn.linear(in_features, out_features)
    # nn.sequential maps edge_features [-1, num_edge_features] to shape [-1, in_channels*out_channels]
    # [-1, in_channels*out_channels] = [-1, 41*32]
    nn1 = nn.Sequential(nn.Linear(2, 10), nn.ReLU(), nn.Linear(10, d.num_features*128))
    self.conv1 = NNConv(d.num_features, 128, nn1, aggr='mean')
    #print(self.conv1)
    # conv2 wts = 32*64
    nn2 = nn.Sequential(nn.Linear(2, 10), nn.ReLU(), nn.Linear(10, 128*256))
    self.conv2 = NNConv(128, 256, nn2, aggr='mean')
    #print(self.conv2)
    self.fc1 = torch.nn.Linear(256, 512)
    self.fc2 = torch.nn.Linear(512, d.num_classes)
    #self.bn1 = torch.nn.BatchNorm1d(256)
    #self.bn2 = torch.nn.BatchNorm1d(512)

def forward(self, data):
    data.x = F.elu(self.conv1(data.x, data.edge_index, data.edge_attr))
    #print(data.x.shape)
    weight = normalized_cut_2d(data.edge_index, data.pos)
    cluster = graclus(data.edge_index, weight, data.x.size(0))
    data = max_pool(cluster, data, transform=T.Cartesian(cat=False))

    data.x = F.elu(self.conv2(data.x, data.edge_index, data.edge_attr))
    weight = normalized_cut_2d(data.edge_index, data.pos)
    cluster = graclus(data.edge_index, weight, data.x.size(0))
    x, batch = max_pool_x(cluster, data.x, data.batch)

    x = global_mean_pool(x, batch)
    x = F.elu(self.fc1(x))
    #x = F.dropout(x, training=self.training)
    #return torch.sigmoid(self.fc2(x)).squeeze(1)
    return F.log_softmax(self.fc2(x), dim=1)
    #print(x.shape)
    #print(x)
    #return x

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = Net().to(device)

lr=0.00000001

optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, verbose=True)

weight = torch.tensor([1/750, 1/250]).to(device)

crit = nn.NLLLoss()

crit = torch.nn.CrossEntropyLoss()

crit = torch.nn.BCELoss()

def train(epoch): model.train() loss_all = 0.0 train_correct = 0 if epoch == 25: for param_group in optimizer.param_groups: param_group['lr'] = 0.0001

if epoch == 100:
    for param_group in optimizer.param_groups:
        param_group['lr'] = 0.00001
        param_group['momentum'] = 0.6

if epoch == 150:
    for param_group in optimizer.param_groups:
        param_group['lr'] = 0.000001
        param_group['momentum'] = 0.7

#if epoch == 25

for data in train_loader:
    data = data.to(device)
    optimizer.zero_grad()
    data.y = torch.tensor(data.y, dtype=torch.long).to(device)
    #loss = crit(model(data), data.y)
    #loss.backward()

    output = model(data)
    #data.y = torch.tensor(data.y, dtype=torch.float).to(device)
    #label = data.y.to(device)

    #print('o/p:', type(output), output)
    #print('label: ', type(data.y), data.y)
    loss = crit(output, data.y)
    loss.backward()
    # loss.item() gets the scalar value held in the loss.
    loss_all += data.num_graphs * loss.item()
    optimizer.step()
    train_correct += output.max(1)[1].eq(data.y).sum().item()
#scheduler.step(loss_all)
return loss_all/len(train_data_list) , train_correct / len(train_data_list)

def test(): model.eval() correct = 0 with torch.no_grad():

    for data in test_loader:
        data = data.to(device)
        pred = model(data).max(1)[1]
        #pred = model(data)
        print('pred', pred)
        print('lbl', data.y)  

        for t, p in zip(data.y.view(-1), pred.view(-1)):
            confusion_matrix[t.long(), p.long()] += 1
    #data.y = torch.tensor(data.y, dtype=torch.float).to(device)
        data.y = torch.tensor(data.y, dtype=torch.long)
        data.y = data.y.to(device)
        correct += pred.eq(data.y).sum().item()
return correct / len(test_data_list)

loss_plot = [] epoch_plot = [] test_acc_plot = [] train_acc_plot = [] for epoch in range(0, 400): loss, train_acc = train(epoch)

test_auc = evaluate(test_loader)

train_acc_plot.append(train_acc)
loss_plot.append(loss)
epoch_plot.append(epoch)
test_acc = test()
test_acc_plot.append(test_acc)
print('Epoch: {:02d}, Train_Loss: {:.5f}, Train_acc: {:.4f}, Test_acc: {:.4f}'.format(epoch, loss, train_acc, test_acc))

Is there any problem with my network? Preprocessing pipeline also seems fine to me. In image, u can see my training acc w.r.t epochs Train_acc

rusty1s commented 4 years ago

It would be easier to read your code if it was formatted correctly. Your code looks mostly correct to me, but for binary classification you should use a one-dimensional output and the BCEWithLogits loss. You can also comment out the graclus pooling calls and see if this improves the model. How do the baselines perform?

sachinsharma9780 commented 4 years ago

I am sorry for Non-Clean code, I was just trying different things. Initially, I had used sigmoid at the last layer with BCE loss but the model was giving constant result. Then I switched to softmax but I think it doesn't make much difference. I have tested my dataset with resnet-50 arch, there also I am getting around 56% test accuracy but the thing is here training accuracy is increasing steadily like going close to 70% in 10 epochs unlike in NNconv where training accuracy fluctuates so much as u can see from the above graph.

I have also seen the images of two classes in my dataset they more or less look same(Do u think this can be the reason for bad results). But the thing is with each corresponding image data I also have metadata of each patient(like age, sex, ethnicity etc) and I haven't incorporated this data into the network, only trying with images. Maybe this metadata can make some difference, I don't know yet. What's your take?

sachinsharma9780 commented 4 years ago

Hi, Any Suggestions on the above comment?

rusty1s commented 4 years ago

Sorry, i missed your previous comment. Adding metadata to your model sounds reasonable. You can do this by concatenating the features to your CNN output.

If your images look the same though, and you cannot even distinguish them as an expert, I guess it is quite hard for a CNN to distinguish them as well. Think about what may indicate the separation of your classes, and how a model may be able to learn to recognize those to improve your model.

sachinsharma9780 commented 4 years ago

Thanx for the reply. So can u point out to any reference paper/code to this idea of "Adding metadata to your model sounds reasonable, You can do this by concatenating the features to your CNN output" will be really helpful?

rusty1s commented 4 years ago

No specific paper in mind. I guess it is just the standard approach to add global information to your CNN.

sachinsharma9780 commented 4 years ago

Sorry, i missed your previous comment. Adding metadata to your model sounds reasonable. You can do this by concatenating the features to your CNN output.

If your images look the same though, and you cannot even distinguish them as an expert, I guess it is quite hard for a CNN to distinguish them as well. Think about what may indicate the separation of your classes, and how a model may be able to learn to recognize those to improve your model.

Hi, Sorry for the late reply was busy with exams! Regarding your above comment on concatenating features to CNN output, So where and how I should concatenate metadata with features embedding learned from CNN.

My idea is: I have two types of data image data and metadata, Now first image data is passed into CNN and from their feature embeddings are learned then I will somehow store these optimal feature embeddings learnt from CNN. Now I will represent these feature embedding as node features in graph with metadata represented as edge weights between nodes which will result in kind of semi-supervised learning using a simple gcn network. Do u think this approach seems reasonable?

Thanks!

rusty1s commented 4 years ago

Hi, I do not see how you can create a graph out of your metadata (e.g. age or sex). How would your graph look like? I believe you can simply learn the CNN embeddings end-to-end, but before making the final predictions via an MLP, you concatenate the metadata to your embeddings:


CNN -------+
           |
           +---> MLP ---> output
           |
Metadata --+
sachinsharma9780 commented 4 years ago

Screenshot from 2019-10-08 23-01-56

I am trying to do something like in image but the thing is here features extracted from the image are handcrafted particularly for that problem. Edge weights are basically calculated based on the similarity between the 2 nodes using meta data and then simple gcn is applied which makes the problem as semi-supervised.

My idea is instead of handcrafting features from image what if we learn feature embedding from CNN and represent those feature embeddings as nodes and rest of the thing remains the same as above.

CNN------Feature_Embeddings------Represented as Nodes in graph--- |

                                                                                   create edge weights 
                                                                                   between nodes by 
                                                                                   applying similarity 
                                                                                   measure--------------------------->GCN----->o/p
                                                                                   between metadata

                                                                                                        |                                                                                                                      

Metadata------------------------------------------------------------------------------

Reference: https://arxiv.org/pdf/1806.01738.pdf

rusty1s commented 4 years ago

Interesting idea. I guess you can do this. While you technically can train this network end-to-end, it might be easier to just use the output of a pre-trained CNN.

sachinsharma9780 commented 4 years ago

Hmmm. Yeah, I was also thinking of using pretrained CNN. Thanx. I will try this approach and let u know about the results.

sachinsharma9780 commented 4 years ago

If i am using pretrained(on imagenet) resnet-50(which is giving me good results) arch. then from which layer i should extract the features. Because usually parameters of initial layers are freezed and we only train fully connected layers. So do I need to extract feature embedding from first layer of full connection?

rusty1s commented 4 years ago

I suggest to use the obtained feature embeddings before the final fully connected prediction takes place. But I am no expert on this, so I suggest you to also consult relevant literature.

sachinsharma9780 commented 4 years ago

ok, Thanx for the help!

sachinsharma9780 commented 4 years ago

Hi, One small doubt regarding data handling of graphs, I am really confused here that when we are adding edge attributes in data.edge_attr , how data.edge_attr knows which edge attr belongs to two particular nodes. For ex. let's say we have a complete graph with 3 nodes having 3 edges and I want find cosine similarity between feature vector of each node with other than how data.edge_attr encode this info between every pair of node knowing that this edge feature belongs to these 2 nodes because data.edge_attr only takes edge feature as input in shape of [num_edges, num_edge_features].

Thanak You!

rusty1s commented 4 years ago

edge_attr follows the ordering defined in edge_index. For example, the first entry in edge_attr corresponds to the edge defined in edge_index[:, 0].

sachinsharma9780 commented 4 years ago

Hmm... ok so if this is the case then in case of above example of complete graph with 3 nodes(undirected, 3 edges) do we need to provide edge attr info for every entry(which will be basically 6 entries) for ex for these 6 pairs: source = [0, 0, 1, 1, 2, 2] , target=[1, ,2, 0, 2, 0, 1] creating edge_attr size [6, 1] but then we have ony 3 edges in graph?

rusty1s commented 4 years ago

Yes :)

sachinsharma9780 commented 4 years ago

ok! Thnx!

sachinsharma9780 commented 4 years ago

hi, while running GCN algo for my problem i got following error: RuntimeError: Subtraction, the - operator, with a bool tensor is not supported. If you are trying to invert a mask, use the ~ or bitwise_not() operator instead.

I followed the instruction given in this issue: https://github.com/rusty1s/pytorch_geometric/issues/616 but still cannot resolve it via updating torch_cluster. So is there any other way except cloning the repository?

sachinsharma9780 commented 4 years ago
  1. one more small doubt regarding GCN, so you are using masking to create test_mask=[], train_mask=[], val_mask=[] data(basically nodes) and as an input to these masks u r giving number of nodes in graph but my question is how we are deciding split percentage of dataset like train_data = 80% , val_data=10%, test_data=10%.
  2. So this is process to perform GCN: First, the whole dataset is converted to Graph, then using masking u set some nodes to train, validation and test nodes from the whole graph and after that we apply GCN algo on it. Correct me if am wrong?
rusty1s commented 4 years ago
  1. mask.sum().item() / mask.size(0) yields the split percentage.
  2. The order is different. We first apply GCN and then select the specific nodes using the masks for loss/metric computation. This is a semi-supervised learning scenario where we make use of the whole graph structure but only make use of the ground-truth of a small amount of nodes.
sachinsharma9780 commented 4 years ago

ok, in the second point we still need to give feature, label and edge_attribute for every node and what masking does is basically disable that node and we test our performance on that masked node(So its just randomly do the masking of nodes for train, val and test set).

rusty1s commented 4 years ago

I am not sure what you mean with "disable that node". The masking tensor just defines on which nodes we want to train our parameters (train_mask), validate them (val_mask) and test them (test_mask).

sachinsharma9780 commented 4 years ago

hi, I created a graph data structure in a similar way mentioned in your documentation for input to GCN algo. It is working fine when I am not including edge_weights/edge_attr but when I try to use edge_weights it is giving me below error. I try to debug it but not successful. Any help! So following is the error:

edge_index shape: torch.Size([2, 56]) edge_attr shape: torch.Size([56, 1]) label shape torch.Size([8]) graph data Data(edge_attr=[56, 1], edge_index=[2, 56], test_mask=[8], train_mask=[8], val_mask=[8], x=[8, 512], y=[8]) train_massk tensor([ True, True, True, True, True, False, False, False]) test_mask tensor([ True, True, False, False, False, False, False, False]) val_mask tensor([ True, True, False, False, False, False, False, False]) ndoe features 512 train_mask sum 5 val_mask sum 2 test_mask sum 2 Traceback (most recent call last): File "create_graph_data_structure.py", line 141, in train() File "create_graph_data_structure.py", line 125, in train F.nll_loss(model()[data.train_mask], data.y[data.train_mask]).backward() File "/b_test/sharma/guided_research_vitrualenv/gr_pytorchg/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, *kwargs) File "create_graph_data_structure.py", line 111, in forward x = F.relu(self.conv1(x, edge_index, edge_attr)) File "/b_test/sharma/guided_research_vitrualenv/gr_pytorchg/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(input, **kwargs) File "/b_test/sharma/guided_research_vitrualenv/gr_pytorchg/lib/python3.6/site-packages/torch_geometric/nn/conv/gcn_conv.py", line 100, in forward self.improved, x.dtype) File "/b_test/sharma/guided_research_vitrualenv/gr_pytorchg/lib/python3.6/site-packages/torch_geometric/nn/conv/gcn_conv.py", line 78, in norm edge_index, edge_weight, fill_value, num_nodes) File "/b_test/sharma/guided_research_vitrualenv/gr_pytorchg/lib/python3.6/site-packages/torch_geometric/utils/loop.py", line 112, in add_remaining_self_loops edge_weight = torch.cat([edge_weight[mask], loop_weight], dim=0) RuntimeError: invalid argument 0: Tensors must have same number of dimensions: got 2 and 1 at /pytorch/aten/src/TH/generic/THTensor.cpp:680

rusty1s commented 4 years ago

Try out edge_weight.flatten().

sachinsharma9780 commented 4 years ago

Thank You very much, it is working now!.

sachinsharma9780 commented 4 years ago

graph_ex

Hi, I want to know what is the data.pos for the graph like above. If I am making a graph by my own with node features and edge features then I need to compute data.pos, in case of an image we basically use the centroid of a segment but what if there is a graph like above? should it be data.pos = [0, 1, 2]^T?

Thanx

rusty1s commented 4 years ago

The data.pos attribute is optional and should only be used if nodes have a position in Euclidean space (like point clouds or superpixel centroids). You can simply omit in the example above.

sachinsharma9780 commented 4 years ago

ok, actually i was using NNconv for Graph classification and there in normalized cut function we are asked to input data.pos so thats why i asked but i guess in my case i have 1 dimensional edge attr and i can just pass number of nodes in num_pos argument like here ( for ex normalized_cut(edge_index, edge_attr, num_nodes=64).

one more question which graph algorithm is good for Graph classification problem. My graphs has node features and 1-D edge attrs features. one i know is NN Conv(edge conditioned conv). Can u suggest some others?