This is a Keras implementation of the symmetrical autoencoder architecture with parameter sharing for the tasks of link prediction and semi-supervised node classification, as described in the following:
Tran, Phi Vu. Learning to Make Predictions on Graphs with Autoencoders. Proceedings of the 5th IEEE International Conference on Data Science and Advanced Analytics (2018). Full oral paper.
Tran, Phi Vu. Multi-Task Graph Autoencoders. NIPS 2018 Workshop on Relational Representation Learning. Short poster paper.
The code is tested on Ubuntu 16.04 with the following components:
Citation networks from Thomas Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks:
Cora
, Citeseer
, Pubmed
Collaboration and social networks from Wang et al. 2016. Structural Deep Network Embedding:
Arxiv-GRQC
, BlogCatalog
Miscellaneous networks from Aditya Krishna Menon and Charles Elkan. 2011. Link Prediction via Matrix Factorization:
Protein
, Metabolic
, Conflict
, PowerGrid
For custom graph datasets, the following are required:
For an example of how to prepare the input dataset, take a look at the load_citation_data()
function in utils_gcn.py
.
For training and evaluation, execute the following bash
commands in the same directory where the code resides:
# Set the PYTHONPATH environment variable
$ export PYTHONPATH="/path/to/this/repo:$PYTHONPATH"
# Train the autoencoder model for network reconstruction
# using only latent features learned from local graph topology.
$ python train_reconstruction.py <dataset_str> <gpu_id>
# Train the autoencoder model for link prediction using
# only latent features learned from local graph topology.
$ python train_lp.py <dataset_str> <gpu_id>
# Train the autoencoder model for link prediction using
# both latent graph features and available explicit node features.
$ python train_lp_with_feats.py <dataset_str> <gpu_id>
# Train the autoencoder model for the multi-task
# learning of both link prediction and semi-supervised
# node classification, simultaneously.
$ python train_multitask_lpnc.py <dataset_str> <gpu_id>
The flag <dataset_str>
refers to one of the following nine supported dataset strings:
protein
, metabolic
, conflict
, powergrid
, cora
, citeseer
, pubmed
, arxiv-grqc
, blogcatalog
. The flag <gpu_id>
denotes the GPU device ID, 0
by default if only one GPU is available.
If you find this work useful, please cite the following:
@inproceedings{Tran-LoNGAE:2018,
author={Tran, Phi Vu},
title={Learning to Make Predictions on Graphs with Autoencoders},
booktitle={5th IEEE International Conference on Data Science and Advanced Analytics},
year={2018}
}