pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
20.91k stars 3.61k forks source link

Dataset construction for implementing GCN #926

Open saimanikant62 opened 4 years ago

saimanikant62 commented 4 years ago

Hi

We have been really fascinated with the Pytorch Geometric package for geometric deep learning and want to implement GCN on our own dataset. We have gone through the guidelines given on https://pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html

but needed more help on the creation of datasets. We want to create a graph with 100 nodes, and each node having 10 features. We want to define the edges based on our criteria which comes later. We were hoping for an implemented example/code of dataset construction which will lay the foundations for our dataset. Requesting help for the same.

rusty1s commented 4 years ago

Do you have specific questions regarding dataset creation? This is mostly just PyTorch code, but instead of returning tensors in __getitem__ we return Data objects. For your problem, something like this might be useful:

class MyDataset(torch.utils.data.Dataset):
    def __init__(self, graph_features, transform=None):
        self.graph_features = graph_features # A [num_graphs, 100, 10] tensor.
        self.transform = transform

    def __len__(self):
        return self.graph_features.size(0)

    def __getitem__(self, idx):
        x = self.graph_features[idx]
        data = Data(x=x)
        if self.transform is not None:
             data = self.transform(data)
        return data

where transform can define edge_index based on your criteria.

saimanikant62 commented 4 years ago

So to be specific. I have an adjacency matrix like the following where A,B,C and D are nodes and the cells depict the edges between the nodes. (A matrix)   A | B | C | D A | 0 | 4 | 6 | 9 B | 2 | 0 | 6 | 9 C | 3 | 5 | 0 | 3 D | 4 | 5 | 2 | 0

Also I have the features matrix of the nodes as follows (X matrix) which describe the characterisitcs of the individual nodes:   | Feature 1 | Feature 2 | Feature 3 | Feature 4 | Labels
A |   |   |   |   |   B |   |   |   |   |   C D

Labels basically are the classes of the nodes that I am trying to predict. These nodes are labelled and I am trying to achieve classification of these nodes as per the labels. If I have two data sets and my labels, how do I input the data into GCN model.

Maddy12 commented 4 years ago

Can that example be used for adding edge indices as well? When the data loader loads a batch of data objects, how does it handle the edge index array? And how will that be handled in the graph convolutions?