pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.28k stars 3.65k forks source link

`HeteroData.subgraph()` #4001

Closed rusty1s closed 2 years ago

rusty1s commented 2 years ago

🚀 The feature, motivation and pitch

Similar to Data.subgraph(), there should exist a HeteroData.subgraph() method to compute subgraphs in a heterogeneous graph setting, e.g., for obtaining inductive node splits. Here, mask/index should be of type dict, holding masks/indices for each/a subset of node types:

hetero_data.subgraph({'paper': mask})

Alternatives

No response

Additional context

No response

michalisfrangos commented 2 years ago

Hi @rusty1s , I was about to open a new discussion, and just realized you are already on this. Just commenting to share my interest in this feature. Cheers.

wsad1 commented 2 years ago

@michalisfrangos are you interested in contributing this feature?

rusty1s commented 2 years ago

Pinging @mananshah99 and @sdulloor here who shared interest in contributing this feature as well.

wsad1 commented 2 years ago

It might be useful to implement a utils.subgraph_bipartite(subset:Tuple[torch.Tensor,torch.Tensor],...) or add support to utils.subgraph for bipartite graphs. I prefer adding a new function over modifying the existing one to make the code more clean. That way HeteroData.subgraph() would make multiple calls to subgraph_bipartite. Something like

subgraph(node_mask_dict):
     ....
     for edge_type in self.edge_types:
           if edge_type[0] in node_mask_dict and..:
                   new_edge, _ , _ = utils.subgraph_bipartite((node_mask_dict[edge_type[0], node_mask_dict[edge_type[-1]))

WDYT?

rusty1s commented 2 years ago

Yes, this looks good to me. Although we overload a lot of functionality with bipartite graph support already (by passing tuples instead of single tensors), I agree that adding this directly to subgraph might makes the code overly complex. bipartite_subgraph is a good alternative that we do not even have to expose.

nabsabraham commented 2 years ago

how would this be different from just sampling a heterogeneous graph with large neighbourhoods to get different node types in the new sampled bipartite graph?

rusty1s commented 2 years ago

Not sure I understand. Can you clarify? The subgraph() method might be useful to gather subgraphs prior to any training or sampling, e.g. for obtaining inductive subgraphs based on a pre-defined split.

nabsabraham commented 2 years ago

so I have a transductive problem (for now) and for heterognn classification I am planning to just use the HGTLoader to get smaller batches for a list of nodes to train my model. Does that set up seem correct? I'm not sure how/if i should be using something like the subgraph() method, (whenever its implemented).

rusty1s commented 2 years ago

It depends on which data you want to train on. If you want to shrink the data prior to training, then HeteroData.subgraph would be applicable to create a smaller subgraph from your original graph. If you just want to operate on smaller batches during training, then you may want to adjust the batch_size argument of a loader.

Let me know if that makes sense to you.