🐛 Describe the bug

I'm trying to do metis clustering on ogbn-papers100M dataset that has around 110 million nodes and 1.6 billion edges. First since torch_geometric.data.ClusterData expects the graph to be undirected so I'm trying to do so by from torch_geometric.utils import to_undirected function but this throws core dumped error so to tackle this I did the following:

dataset = PygNodePropPredDataset(name = "ogbn-papers100M",root="/DATA/Mohit/SGFomer_related/data/ogbn_papers100m/dataset") graph = dataset[0] edge_index=graph.edge_index row, col = edge_index[0], edge_index[1] row, col = torch.cat([row, col], dim=0), torch.cat([col, row], dim=0) edge_index = torch.stack([row, col], dim=0) graph.edge_index=edge_index

The above approach works but when I pass the graph to ClusterData, it throws core dumped error again and program stops execution without any explicit error. My code is as follows:

from ogb.nodeproppred import PygNodePropPredDataset from torch_geometric.data import ClusterData import torch from torch_geometric.utils import to_undirected num_partitions=10000 dataset = PygNodePropPredDataset(name = "ogbn-papers100M",root="path") graph = dataset[0] edge_index=graph.edge_index row, col = edge_index[0], edge_index[1] row, col = torch.cat([row, col], dim=0), torch.cat([col, row], dim=0) edge_index = torch.stack([row, col], dim=0) graph.edge_index=edge_index

graph.edge_index = to_undirected(graph.edge_index, num_nodes=graph.num_nodes)

split_idx = dataset.get_idx_split() save_path = $path$

adding split ids inside the graph itself (Cluster-GCN idea)

for key, idx in split_idx.items(): mask = torch.zeros(graph.num_nodes, dtype=torch.bool) mask[idx] = True graph[f'{key}_mask'] = mask

cluster_data = ClusterData(graph, num_parts=num_partitions, recursive=False, save_dir=dataset.processed_dir)

my RAM is of 512 GB, Any workaround for this issue? I'm working on such a large dataset for the first time so help is highly appreciated!

Versions

torch==2.0.1+cu117 [pip3] torch-cluster==1.6.3+pt20cu117 [pip3] torch_geometric==2.4.0 [pip3] torch-scatter==2.1.2+pt20cu117 [pip3] torch-sparse==0.6.18+pt20cu117 [pip3] torch-spline-conv==1.2.2+pt20cu117 [pip3] torch-tb-profiler==0.4.3 [pip3] torchaudio==2.0.2+cu117 [pip3] torchvision==0.15.2+cu117

pyg-team / pytorch_geometric

ClusterData not working for papers-100M dataset #9325

🐛 Describe the bug

graph.edge_index = to_undirected(graph.edge_index, num_nodes=graph.num_nodes)

adding split ids inside the graph itself (Cluster-GCN idea)

Versions