gnn_benchmark_dataset raw data format

pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch

https://pyg.org

MIT License

21.15k stars 3.64k forks source link

gnn_benchmark_dataset raw data format #3053

Closed devnkong closed 3 years ago

devnkong commented 3 years ago

https://github.com/rusty1s/pytorch_geometric/blob/e425622d6efc6832b15e9fe577710a7119d76cef/torch_geometric/datasets/gnn_benchmark_dataset.py#L53-L54

Hello Matthias,

I wanna know how I can generate the raw data used above. From the original paper their raw data is loaded here, which seems quite diffrerent from yours. Thanks!

Best Kezhi

rusty1s commented 3 years ago

The raw data is pre-processed by me, so that it can be accessed without using pickle. Note that pickle has trouble loading the data without having access to the classes and file structure of the GNNBenchmark repo). Other-wise, the data is equivalent.

devnkong commented 3 years ago

I see, is it possible for you to share the preprocessing script? I wanna create a dataset using GNN Benchmark‘s method but do wanna code under the PyG scheme, thx! Could possibly move faster with your help!

rusty1s commented 3 years ago

This is the one I used for loading the SBM datasets (others are similar):

name = 'PATTERN'

dataset = SBMsDataset(f'SBM_{name}')

num_train = len(dataset.train)
num_val = len(dataset.val)

data_list = []
for G, y in chain(dataset.train, dataset.val, dataset.test):
    y = y.to(torch.long)
    x = torch.nn.functional.one_hot(G.ndata['feat']).to(torch.float)

    row, col = G.edges()
    edge_index = torch.stack([row, col], dim=0).to(torch.long)
    data = {'x': x, 'y': y, 'edge_index': edge_index}
    data_list.append(data)

train_data_list = data_list[:num_train]
val_data_list = data_list[num_train:num_train + num_val]
test_data_list = data_list[num_train + num_val:]

devnkong commented 3 years ago

Thank you!