PRODIGY: Enabling In-context Learning Over Graphs
A pretraining framework enabling in-context learning over graphs => pretrain graph model and adapt to diverse downstream tasks on unseen graphs without parameter optimization!
Paper: https://arxiv.org/abs/2305.12600 (short paper accepted at SPIGM @ ICML 2023)
Authors: Qian Huang, Hongyu Ren, Peng Chen, Gregor Kržmanc, Daniel Zeng, Percy Liang, Jure Leskovec
Setup
pip install -r requirements.txt
pip install pyg_lib torch_scatter torch_sparse torch_cluster -f https://data.pyg.org/whl/torch-2.0.1+cu117.html
All datasets should be prepared to individual folders under . For MAG and arXiv, the datasets will be automatically downloaded and processed to . In case of memory issue when generating adjacency matrix, we also provide the preprocessed MAG adjacency matrix that should be put under /mag240m after the ogb download.
For KG, download preprocessed Wiki and FB15K-237 datasets to . Download other KG datasets (NELL and ConceptNet) similarly following links in https://github.com/snap-stanford/csr.
Pretraining and Evaluation Commands
PRODIGY pretraining on MAG240M
python experiments/run_single_experiment.py --dataset mag240m --root <DATA_ROOT> --original_features True -ds_cap 50010 -val_cap 100 -test_cap 100 --epochs 1 -ckpt_step 1000 -layers S2,U,M -lr 3e-4 -way 30 -shot 3 -qry 4 -eval_step 1000 -task cls_nm_sb -bs 1 -aug ND0.5,NZ0.5 -aug_test True -attr 1000 --device 0 --prefix MAG_PT_PRODIGY
Prefix specifies the run name prefix in wandb and checkpoints will be saved to ./state/MAG_PTPRODIGY/checkpoint/
PRODIGY evaluation on arXiv
python experiments/run_single_experiment.py --dataset arxiv --root <DATA_ROOT> -ds_cap 510 -val_cap 510 -test_cap 500 -eval_step 100 -epochs 1 --layers S2,U,M -way 3 -shot 3 -qry 3 -lr 1e-5 -bert roberta-base-nli-stsb-mean-tokens -pretrained <PATH_TO_CHECKPOINT> --eval_only True --train_cap 10 --device 0
Commands for Other Configurations and Datasets
Pretraining for PG-NM and PG-MT. (Evalution code is the same as PRODIGY.)
```
python experiments/run_single_experiment.py --dataset mag240m --root --original_features True -ds_cap 10010 -val_cap 100 -test_cap 100 --epochs 1 -ckpt_step 1000 -layers S2,U,M -lr 3e-4 -way 30 -shot 3 -qry 4 -eval_step 500 -task neighbor_matching -bs 1 -aug ND0.5,NZ0.5 -aug_test True -attr 1000 --device 0 --prefix MAG_PG_NM
python experiments/run_single_experiment.py --dataset mag240m --root --original_features True -ds_cap 10010 -val_cap 100 -test_cap 100 --epochs 1 -ckpt_step 1000 -layers S2,U,M -lr 3e-4 -way 30 -shot 3 -qry 4 -eval_step 500 -task classification -bs 1 -aug ND0.5,NZ0.5 -aug_test True -attr 1000 --device 0 --prefix MAG_PG_MT
```
Pretraining for Contrastive
```
python experiments/run_single_experiment.py --dataset mag240m --root --original_features True --input_dim 768 --emb_dim 256 -ds_cap 10010 -val_cap 100 -test_cap 100 --epochs 1 -ckpt_step 1000 -layers S2,U,A -lr 1e-3 -way 30 -shot 1 -qry 4 -eval_step 500 -task same_graph -bs 1 -aug ND0.5,NZ0.5 -aug_test True --device 0 --prefix MAG_Contrastive
```
Evaluation for Contrastive
```
python experiments/run_single_experiment.py --dataset arxiv --root --emb_dim 256 --input_dim 768 -ds_cap 510 -val_cap 510 -test_cap 500 -eval_step 100 -epochs 1 --layers S2,U,A -way 3 -shot 3 -qry 3 -lr 1e-5 -bert roberta-base-nli-stsb-mean-tokens -pretrained --eval_only True --train_cap 10 --device 0
```
Execute `kg_commands.py` for examples of pretraining and evaluation commands for KG datasets (uncomment code inside for all commands).
Preprocessing and data loading code for some graph datasets. See `DATASETS.md` for dataset info.
Citations
If you use this repo, please cite the following paper. This repo reuses code from CSR for KG datasets loading.
@article{Huang2023PRODIGYEI,
title={PRODIGY: Enabling In-context Learning Over Graphs},
author={Qian Huang and Hongyu Ren and Peng Chen and Gregor Kr\v{z}manc and Daniel Zeng and Percy Liang and Jure Leskovec},
journal={ArXiv},
year={2023},
volume={abs/2305.12600}
}