pyg-team / pytorch_geometric

Graph Neural Network Library for PyTorch
https://pyg.org
MIT License
21.39k stars 3.67k forks source link

GNN Explainability Dataset Generation #5817

Closed RexYing closed 1 year ago

RexYing commented 2 years ago

🚀 The feature, motivation and pitch

Provide support for synthetic datasets commonly used in explainability papers.

This is part of the explainability roadmap #5520 .

Create a high-level API for the following functionalities (each individual sub-tasks will be specified in a different issue).

Synthetic datasets are often useful for explainability. While not always the most accurate benchmark for GNN explainability, they can be used to validate explainability algorithms, to debug models, and to provide groundtruths for certain evaluation such as identifying important subgraph structure.

Following GNNExplainer, the dataset construction has 2 parts:

List of tasks:

PyG could create a general routine and framework to construct these datasets. For base graphs, aside from those mentioned, it is also worth considering KDD 2022 GraphWorld, which covers a wide range of structure characteristics and useful for real-world GNN research. For motifs, we can consider the motif atlas (for all size 4, 5, 6, 7 motifs ...). A user can construct a custom dataset by picking a random seed, a base graph generator, and a motif, and test the explainability performance by the ability to identify the selected motif as the important subgraph.

To ensure reproducibility, the dataset generator class will have an option to set a standard seed, a number of standard base graph generator and a number of motif patterns. As such, for research reproducibility, one can simply use the deterministic setting to obtain a standard set of benchmark datasets for explainability.

Alternatives

No response

Additional context

No response

rfdavid commented 1 year ago

Here is a draft idea regarding a potential architecture for this issue:

# Interface for the user
generator = GraphWorld(…)
motif = MotifGenerator(...)
dataset = ExplainerDataset(generator=generator, motif, seed …)

class GraphWorld(GraphGenerator):
- process

class BAGraph(GraphGenerator):
- process

class GraphGenerator():
- generate labels

# Generate motif based on the chosen structure
class MotifGenerator():

class ExplainerDataset(InMemoryDataset): # data/synthetic_dataset.py
- attach motif