Open MartinSchmitz95 opened 2 years ago
We cannot control metis execution, and as far as I know it expects an undirected graph as input. As such, I recommend that you input an undirected graph as input, collect the partitions and node ids, and then apply them on your directed graph via data.subgraph()
. WDYT?
Thank you very much. My code looks like this now and it seems to work.
Set ids in graph manually:
graph.node_ids = torch.arange(pyg_graph.num_nodes)
Transform graph to undirected:
transform = T.ToUndirected()
undir_graph = transform(graph)
Run metis on the undirected graph
train_cluster_data = loader.ClusterData(undir_graph, num_parts=num_clusters, recursive=False, save_dir='../data/cache')
train_loader = loader.ClusterLoader(train_cluster_data, batch_ size=batch_size, shuffle=True)
Take the node ids of the metis partition and create a subgraph of the original directed graph out of it
for data in train_loader:
data = graph.subgraph(data.node_ids)
I am not sure if my manual id setting with torch arrange works as intended though.
I think this looks correct. Does it work? :)
It works, I think we can close this thread. Thanks a lot for your help :)
Just one thing I also want to mention: I have edge features in my graph data.e
.
After using the subgraph function as shown, the feature matrix stays the same. In order to retrieve only the edge features of the subgraph, I have to take: data.e[data.edge_index][0]
This shouldn‘t be the case. subgraph
should be able to handle both node and edge features. Can you show me an example?
Hmm I tried to replicate the problem on my local machine, and there the subgraph works perfectly fine. It only behaves like this when I run it on my server. It could be related to CUDA.
Interesting. Let me know if you can share your data or have some additional pointers on where the error might occur from. You can also test your script with env variable `CUDA_LAUNCH_BLOCKING=1
for better error messages.
@rusty1s This code replicates the issue. If you set the argument force_undirected=True
, then the error no longer occurs. The error occurs in the metis.py
file in torch_sparse
on line 67 cluster = torch.ops.torch_sparse.partition(rowptr, col, value, num_parts, recursive)
from torch_geometric.datasets.amazon import Amazon
from torch_geometric.loader import ClusterData, ClusterLoader
from torch_geometric.data import Data
import torch
from torch_geometric.utils import dropout_edge
def set_random(random_seed: int):
torch.manual_seed(random_seed)
torch.cuda.manual_seed_all(random_seed)
return
set_random(42)
dataset_name = 'Computers'
n_clusters = 10
data = Amazon(root=f'data/{dataset_name}', name=dataset_name)[0]
cluster_data = ClusterData(data, num_parts=n_clusters)
train_loader = ClusterLoader(cluster_data, batch_size=1, shuffle=False)
for i, batch in enumerate(train_loader):
train_edge_index, train_edge_mask = dropout_edge(batch.edge_index, p=0.7, force_undirected=False)
split_data = Data(x=batch.x, y=batch.y, edge_index=train_edge_index)
cluster_data = ClusterData(split_data, num_parts=10)
🐛 Describe the bug
Hello, I am trying to divide a directed graph into clusters using Metis.
loader.ClusterData(graph, num_parts=100, recursive=False)
The ClusterData works as long as 'num_parts' is very small (<20). As soon as I choose a higher parameter like 100, it crashes with a Segmentation fault error. When I convert my graphs into undirected graphs it works without problems, but I would like to keep the directed graph.
Is there a fix for this Metis problem? Or is there a workaround, maybe to reconstruct the directed graph after the Metis partitioning?
Environment