This repository is built for the paper Towards Better Dynamic Graph Learning: New Architecture and Unified Library.
π If you have any questions or suggestions, please feel free to let us know. You can directly email Le Yu using the email address yule@buaa.edu.cn or post an issue on this repository.
Dynamic Graph Library (DyGLib) is an open-source toolkit with standard training pipelines, extensible coding interfaces, and comprehensive evaluating strategies, which aims to promote standard, scalable, and reproducible dynamic graph learning research. Diverse benchmark datasets and thorough baselines are involved in DyGLib.
Fourteen datasets are used in DyGLib, including Wikipedia, Reddit, MOOC, LastFM, Myket, Enron, Social Evo., UCI, Flights, Can. Parl., US Legis., UN Trade, UN Vote, and Contact. The first five datasets are bipartite, and the others only contain nodes with a single type.
Most of the used original dynamic graph datasets come from Towards Better Evaluation for Dynamic Link Prediction,
which can be downloaded here.
Please download them and put them in DG_data
folder.
The Myket dataset comes from Effect of Choosing Loss Function when Using T-batching for Representation Learning on Dynamic Networks and
can be accessed from here.
The original and preprocessed files for Myket dataset are included in this repository.
We can run preprocess_data/preprocess_data.py
for pre-processing the datasets.
For example, to preprocess the Wikipedia dataset, we can run the following commands:
cd preprocess_data/
python preprocess_data.py --dataset_name wikipedia
We can also run the following commands to preprocess all the original datasets at once:
cd preprocess_data/
python preprocess_all_data.py
Eight popular continuous-time dynamic graph learning methods are included in DyGLib, including JODIE, DyRep, TGAT, TGN, CAWN, EdgeBank, TCL, and GraphMixer. Our recent work DyGFormer is also integrated into DyGLib, which can explore the correlations of the source node and destination node by a neighbor co-occurrence encoding scheme, and effectively and efficiently benefit from longer histories via a patching technique.
DyGLib supports dynamic link prediction under both transductive and inductive settings with three (i.e., random, historical, and inductive) negative sampling strategies, as well as dynamic node classification.
New datasets and new models are welcomed to be incorporated into DyGLib by pull requests.
DG_data/DATASETS_README.md
.
Users can put the new datasets in DG_data
folder, and then run preprocess_data/preprocess_data.py
to get the processed datasets.models
folder,
and then create the model in train_xxx.py
or evaluate_xxx.py
to run the model.PyTorch 1.8.1, numpy, pandas, tqdm, and tabulate
Dynamic link prediction could be performed on all the thirteen datasets. If you want to load the best model configurations determined by the grid search, please set the load_best_configs argument to True.
python train_link_prediction.py --dataset_name wikipedia --model_name DyGFormer --patch_size 2 --max_input_sequence_length 64 --num_runs 5 --gpu 0
python train_link_prediction.py --dataset_name wikipedia --model_name DyGFormer --load_best_configs --num_runs 5 --gpu 0
Three (i.e., random, historical, and inductive) negative sampling strategies can be used for model evaluation.
python evaluate_link_prediction.py --dataset_name wikipedia --model_name DyGFormer --patch_size 2 --max_input_sequence_length 64 --negative_sample_strategy random --num_runs 5 --gpu 0
python evaluate_link_prediction.py --dataset_name wikipedia --model_name DyGFormer --negative_sample_strategy random --load_best_configs --num_runs 5 --gpu 0
Dynamic node classification could be performed on Wikipedia and Reddit (the only two datasets with dynamic labels).
python train_node_classification.py --dataset_name wikipedia --model_name DyGFormer --patch_size 2 --max_input_sequence_length 64 --num_runs 5 --gpu 0
python train_node_classification.py --dataset_name wikipedia --model_name DyGFormer --load_best_configs --num_runs 5 --gpu 0
python evaluate_node_classification.py --dataset_name wikipedia --model_name DyGFormer --patch_size 2 --max_input_sequence_length 64 --num_runs 5 --gpu 0
python evaluate_node_classification.py --dataset_name wikipedia --model_name DyGFormer --load_best_configs --num_runs 5 --gpu 0
We are grateful to the authors of TGAT, TGN, CAWN, EdgeBank, and GraphMixer for making their project codes publicly available.
Please consider citing our paper when using this project.
@article{yu2023towards,
title={Towards Better Dynamic Graph Learning: New Architecture and Unified Library},
author={Yu, Le and Sun, Leilei and Du, Bowen and Lv, Weifeng},
journal={Advances in Neural Information Processing Systems},
year={2023}
}