wyl7 / DCI-pytorch

The pytorch implementation of DCI (SIGIR2021).
35 stars 9 forks source link

DCI-pytorch

The pytorch implementation of decoupling representation learning and classification for GNN-based anomaly detection (SIGIR 2021). We run on a DELL server with 2 Intel(R) Xeon(R) Silver 4210, 4 NVIDIA TITAN V (12G), 10 32GB DDR4 RAM and 1 8TB hard disk.

See our paper for details on the algorithm.

Abstract

GNN-based anomaly detection has recently attracted considerable attention. Existing attempts have thus far focused on jointly learning the node representations and the classifier for detecting the anomalies. Inspired by the recent advances of self-supervised learning (SSL) on graphs, we explore another possibility of decoupling the node representation learning and the classification for anomaly detection. Decoupled training can alleviate the negative effects caused by the inconsistency between user’s behavior patterns and their label semantics. The proposed SSL scheme, called Deep cluster Infomax (DCI), can contribute to the decoupled training. In effect, the idea of decoupled training is not restricted to the anomaly detection.

If you make use of our idea in your work, please cite the following paper:

 @inproceedings{Wang2021decoupling,
     author = {Yanling Wang and Jing Zhang and Shasha Guo and Hongzhi Yin and Cuiping Li and Hong Chen},
     title = {Decoupling Representation Learning and Classification for GNN-based Anomaly Detection},
     booktitle = {SIGIR},
     year = {2021}
   }

Requirements

You can create a virtual environment first via:

conda create -n your_env_name python=3.8.5

You can install all the required tools using the following command:

# CUDA 10.2
conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=10.2 -c pytorch
$ pip install -r requirements.txt

Overview

Here we provide the implementation of different training schemes (i.e., joint training and decoupled training) in PyTorch, along with an execution example (on the Wiki dataset). For the decoupled training, we provide different instantiations of the SSL loss function, including DGI and DCI. Specifically, the repository is organized as follows:

Running the code

To run the joint training scheme, execute:

$ python main_dci.py --training_scheme joint --dataset wiki

To run the decoupled training scheme with DGI, execute:

$ python main_dgi.py --dataset wiki

To run the decoupled training scheme with DCI, execute:

$ python main_dci.py --dataset wiki --training_scheme decoupled --num_cluster <number of clusters>

Notes: the optimal \<number of clusters> could be somewhat different under different environments (e.g., different versions of PyTorch), you can use the suggested method introduced in our paper to determine a proper \<number of clusters> for your dataset. Besides DGI and DCI, you can try other graph SSL algorithms. Even though the SSL objective does not rely on the task-specific label information, it should be related to your classification task.

Reference

[1] Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: online learning of social representations. In KDD. 701–710.

[2] Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In KDD. 855–864.

[3] Petar Velickovic, William Fedus, William L. Hamilton, Pietro Liò, Yoshua Bengio, and R. Devon Hjelm. 2019. Deep Graph Infomax. In ICLR.

[4] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How Powerful are Graph Neural Networks?. In ICLR.