qitianwu / NodeFormer

The official implementation of NeurIPS22 spotlight paper "NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification"
286 stars 27 forks source link
graph-neural-networks graph-structure-learning graph-transformer image-classification large-graph neurips-2022 node-classification pytorch pytorch-geometric relational-learning text-classification

NodeFormer: A Graph Transformer with Linear Complexity

The official implementation for "NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification" which is accepted to NeurIPS22 as a spotlight presentation.

Related materials: [paper], [slides], [blog Chinese | English], [vedio Chinese | English], [tutorial]

Two follow-up works: DIFFormer (ICLR2023 spotlight) a physics-inspired scalable Transformer, SGFormer (NeurIPS2023) a simplified Transformer scaling to billion-scale graphs

What's news

This work takes an initial step for exploring Transformer-style graph encoder networks for large node classification graphs, dubbed as NodeFormer, as an alternative to common Graph Neural Networks, in particular for encoding nodes in an input graph into embeddings in latent space.

The highlights of NodeFormer

NodeFormer is a pioneering Transformer model for node classification on large graphs. NodeFormer scales all-pair message passing with efficient latent structure learning to million-level nodes. NodeFormer has several merits:

Structures of the Codes

The key module of NodeFormer is the kernelized (Gumbel-)Softmax message passing which achieves all-pair message passing on a latent graph in each layer with $O(N)$ complexity. The nodeformer.py implements our model:

For other files, the descriptions are below:

Datasets

We provide an easy access to the used datasets in the Google drive. This also contains other commonly used graph datasets, except the large-scale graphs OGBN-Proteins and Amazon2M which can be downloaded automatically with our codes See here for how to get the datasets ready for running our codes.

The information and sources of datasets are summarized below

Key results

Dataset Split Metric Result Hyper-parameters/Checkpoints
Cora random 50%/25%/25% Accuracy 88.80 (0.26) train script
CiteSeer random 50%/25%/25% Accuracy 76.33 (0.59) train script
Deezer random 50%/25%/25% ROC-AUC 71.24 (0.32) train script
Actor random 50%/25%/25% Accuracy 35.31 (0.89) train script
OGBN-Proteins public split ROC-AUC 77.45 (1.15) train script, checkpoint, test script
Amazon2M random 50%/25%/25% Accuracy 87.85 (0.24) train script, checkpoint and fixed splits, test script
Mini-ImageNet (kNN, k=5) random 50%/25%/25% Accuracy 86.77 (0.45) train script
Mini-ImageNet (no graph) random 50%/25%/25% Accuracy 87.46 (0.36) train script
20News-Group (kNN, k=5) random 50%/25%/25% Accuracy 66.01 (1.18) train script
20News-Group (no graph) random 50%/25%/25% Accuracy 64.71 (1.33) train script
Cora 20 nodes per class for train Accuracy 83.4 (0.2) train script
CiteSeer 20 nodes per class for train Accuracy 73.0 (0.3) train script
Pubmed 20 nodes per class for train Accuracy 81.5 (0.4) train script

How to run our codes?

  1. Install the required package according to requirements.txt

  2. Create a folder ../data and download the datasets from here (For large graph datasets Proteins and Amazon2M, the datasets will be automatically downloaded)

  3. To run the training and evaluation on eight datasets we used, one can use the scripts in run.sh:

# node classification on small datasets
python main.py --dataset cora --rand_split --metric acc --method nodeformer --lr 0.001 \
--weight_decay 5e-3 --num_layers 2 --hidden_channels 32 --num_heads 4 --rb_order 2 --rb_trans sigmoid \
--lamda 1.0 --M 30 --K 10 --use_bn --use_residual --use_gumbel --runs 5 --epochs 1000 --device 0

# node classification on large datasets
python main-batch.py --dataset ogbn-proteins --metric rocauc --method nodeformer --lr 1e-2 \
--weight_decay 0. --num_layers 3 --hidden_channels 64 --num_heads 1 --rb_order 1 --rb_trans identity \
--lamda 0.1 --M 50 --K 5 --use_bn --use_residual --use_gumbel --use_act --use_jk --batch_size 10000 \
--runs 5 --epochs 1000 --eval_step 9 --device 0

# image and text datasets
python main.py --dataset mini --metric acc --rand_split --method nodeformer --lr 0.001\
--weight_decay 5e-3 --num_layers 2 --hidden_channels 128 --num_heads 6\
--rb_order 2 --rb_trans sigmoid --lamda 1.0 --M 30 --K 10 --use_bn --use_residual --use_gumbel \
--run 5 --epochs 300 --device 0
  1. We also provide the checkpoints of NodeFormer on two large datasets, OGBN-Proteins and Amazon2M. One can download the trained models into ../model/ and run the scripts in run_test_large_graph.sh for reproducing the results.

Potential Applications and More Usability

NodeFormer can in principle be applied to solve three families of tasks:

Our work takes an initial step for exploring how to build a scalable graph Transformer model for node classification, and we also believe there exists ample room for further research and development as future works. One can also use our implementation kernelized_softmax() and kernelized_gumbel_softmax() for related projects concerning e.g., structure learning and communication, where the scalability matters.

Citation

If you find our codes useful, please consider citing our work

      @inproceedings{wu2022nodeformer,
      title = {NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification},
      author = {Qitian Wu and Wentao Zhao and Zenan Li and David Wipf and Junchi Yan},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      year = {2022}
      }

ACK

We acknowledge the implementation of the softmax kernel

https://github.com/lucidrains/performer-pytorch

and the training pipeline for GNN node classification

https://github.com/CUAI/Non-Homophily-Benchmarks