Closed devnkong closed 3 years ago
The raw data is pre-processed by me, so that it can be accessed without using pickle
. Note that pickle
has trouble loading the data without having access to the classes and file structure of the GNNBenchmark repo). Other-wise, the data is equivalent.
I see, is it possible for you to share the preprocessing script? I wanna create a dataset using GNN Benchmark‘s method but do wanna code under the PyG scheme, thx! Could possibly move faster with your help!
This is the one I used for loading the SBM
datasets (others are similar):
name = 'PATTERN'
dataset = SBMsDataset(f'SBM_{name}')
num_train = len(dataset.train)
num_val = len(dataset.val)
data_list = []
for G, y in chain(dataset.train, dataset.val, dataset.test):
y = y.to(torch.long)
x = torch.nn.functional.one_hot(G.ndata['feat']).to(torch.float)
row, col = G.edges()
edge_index = torch.stack([row, col], dim=0).to(torch.long)
data = {'x': x, 'y': y, 'edge_index': edge_index}
data_list.append(data)
train_data_list = data_list[:num_train]
val_data_list = data_list[num_train:num_train + num_val]
test_data_list = data_list[num_train + num_val:]
Thank you!
https://github.com/rusty1s/pytorch_geometric/blob/e425622d6efc6832b15e9fe577710a7119d76cef/torch_geometric/datasets/gnn_benchmark_dataset.py#L53-L54
Hello Matthias,
I wanna know how I can generate the raw data used above. From the original paper their raw data is loaded here, which seems quite diffrerent from yours. Thanks!
Best Kezhi