OpenNE is a sub-project of OpenSKL, providing an Open-source Network Embedding toolkit for network representation learning (NRL), with TADW as key features to incorporate text attributes of nodes.
OpenNE provides a standard training and testing toolkit for network embedding. We unify the input and output interfaces of different NE models and provide scalable options for each model. Moreover, we implement typical NE models based on tensorflow, which enables these models to be trained with GPUs.
Besides TADW for learning network embeddings with text attributes, we also implement typical models including DeepWalk LINE, node2vec, GraRep, , GCN, HOPE, GF, SDNE and LE. If you want to learn more about network embedding, visit another our NRL paper list.
To validate the effectiveness of this toolkit, we employ the node classification task for evaluation.
We show the node classification results of various methods in different datasets. We set representation dimension to 128, kstep=4 in GraRep. Note that, both GCN(a semi-supervised NE model) and TADW need additional text features as inputs. Thus, we evaluate these two models on Cora in which each node has text information. We use 10% labeled data to train GCN.
Wiki (Wiki dataset is provided by LBC project. But the original link failed.): 2405 nodes, 17981 edges, 19 labels, directed:
Cora: 2708 nodes, 5429 edges, 7 labels, directed:
Running environment:
BlogCatalog: CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz.
Wiki, Cora: CPU: Intel(R) Core(TM) i5-7267U CPU @ 3.10GHz.
We report the Micro-F1 and Macro-F1 performance to quantify the effectiveness, and the running time for efficiency evaluation. Overall, OpenNE can reproduce the results in the original papers. Our proposed TADW achieves better performance than DeepWalk with the help of text attributes.
Wiki:
Algorithm | Time | Micro-F1 | Macro-F1 |
---|---|---|---|
DeepWalk | 52s | 0.669 | 0.560 |
LINE 2nd | 70s | 0.576 | 0.387 |
node2vec | 32s | 0.651 | 0.541 |
GraRep | 19.6s | 0.633 | 0.476 |
OpenNE(DeepWalk) | 42s | 0.658 | 0.570 |
OpenNE(LINE 2nd) | 90s | 0.661 | 0.521 |
OpenNE(Node2vec) | 33s | 0.655 | 0.538 |
OpenNE(GraRep) | 23.7s | 0.649 | 0.507 |
OpenNE(GraphFactorization) | 12.5s | 0.637 | 0.450 |
OpenNE(HOPE) | 3.2s | 0.601 | 0.438 |
OpenNE(LaplacianEigenmaps) | 4.9s | 0.277 | 0.073 |
OpenNE(SDNE) | 39.6s | 0.643 | 0.498 |
Cora:
Algorithm | Dropout | Weight_decay | Hidden | Dimension | Time | Accuracy |
---|---|---|---|---|---|---|
DeepWalk | - | - | - | 160 | 33.5s | 0.713 |
TADW | - | - | - | 80*2 | 13.9s | 0.780 |
GCN | 0.5 | 5e-4 | 16 | - | 4.0s | 0.790 |
OpenNE(TADW) | - | - | - | 80*2 | 20.8s | 0.791 |
OpenNE(GCN) | 0.5 | 5e-4 | 16 | - | 5.5s | 0.789 |
OpenNE(GCN) | 0 | 5e-4 | 16 | - | 6.1s | 0.779 |
OpenNE(GCN) | 0.5 | 1e-4 | 16 | - | 5.4s | 0.783 |
OpenNE(GCN) | 0.5 | 5e-4 | 64 | - | 6.5s | 0.779 |
pip install -r requirements.txt
cd src
python setup.py install
You can check out the other options available to use with OpenNE using:
python -m openne --help
To run "node2vec" on BlogCatalog network and evaluate the learned representations on multi-label node classification task, run the following command in the home directory of this project:
python -m openne --method node2vec --label-file data/blogCatalog/bc_labels.txt --input data/blogCatalog/bc_adjlist.txt --graph-format adjlist --output vec_all.txt --q 0.25 --p 0.25
To run "gcn" on Cora network and evaluate the learned representations on multi-label node classification task, run the following command in the home directory of this project:
python -m openne --method gcn --label-file data/cora/cora_labels.txt --input data/cora/cora_edgelist.txt --graph-format edgelist --feature-file data/cora/cora.features --epochs 200 --output vec_all.txt --clf-ratio 0.1
DeepWalk and node2vec:
LINE:
GraRep:
TADW:
GCN:
GraphFactorization:
SDNE:
The supported input format is an edgelist or an adjlist:
edgelist: node1 node2 <weight_float, optional>
adjlist: node n1 n2 n3 ... nk
The graph is assumed to be undirected and unweighted by default. These options can be changed by setting the appropriate flags.
If the model needs additional features, the supported feature input format is as follow (feature_i should be a float number):
node feature_1 feature_2 ... feature_n
The output file has n+1 lines for a graph with n nodes. The first line has the following format:
num_of_nodes dim_of_representation
The next n lines are as follows:
node_id dim1 dim2 ... dimd
where dim1, ... , dimd is the d-dimensional representation learned by OpenNE.
If you want to evaluate the learned node representations, you can input the node labels. It will use a portion (default: 50%) of nodes to train a classifier and calculate F1-score on the rest dataset.
The supported input label format is
node label1 label2 label3...
To show how to apply dimension reduction methods like t-SNE and PCA to embedding visualization, we choose the 20 newsgroups dataset. Using the text feature, we built the news network by kneighbors_graph
in scikit-learn. We uploaded the results of different methods in t-SNE-PCA.pptx where the colors of nodes represent the labels of nodes. A simple script is shown as follows:
cd visualization_example
python 20newsgroup.py
tensorboard --logdir=log/
After running the tensorboard, visit localhost:6006
to view the result.
If you find OpenNE is useful for your research, please consider citing the following papers:
@InProceedings{perozzi2014deepwalk,
Title = {Deepwalk: Online learning of social representations},
Author = {Perozzi, Bryan and Al-Rfou, Rami and Skiena, Steven},
Booktitle = {Proceedings of KDD},
Year = {2014},
Pages = {701--710}
}
@InProceedings{tang2015line,
Title = {Line: Large-scale information network embedding},
Author = {Tang, Jian and Qu, Meng and Wang, Mingzhe and Zhang, Ming and Yan, Jun and Mei, Qiaozhu},
Booktitle = {Proceedings of WWW},
Year = {2015},
Pages = {1067--1077}
}
@InProceedings{grover2016node2vec,
Title = {node2vec: Scalable feature learning for networks},
Author = {Grover, Aditya and Leskovec, Jure},
Booktitle = {Proceedings of KDD},
Year = {2016},
Pages = {855--864}
}
@article{kipf2016semi,
Title = {Semi-Supervised Classification with Graph Convolutional Networks},
Author = {Kipf, Thomas N and Welling, Max},
journal = {arXiv preprint arXiv:1609.02907},
Year = {2016}
}
@InProceedings{cao2015grarep,
Title = {Grarep: Learning graph representations with global structural information},
Author = {Cao, Shaosheng and Lu, Wei and Xu, Qiongkai},
Booktitle = {Proceedings of CIKM},
Year = {2015},
Pages = {891--900}
}
@InProceedings{yang2015network,
Title = {Network representation learning with rich text information},
Author = {Yang, Cheng and Liu, Zhiyuan and Zhao, Deli and Sun, Maosong and Chang, Edward},
Booktitle = {Proceedings of IJCAI},
Year = {2015}
}
@Article{tu2017network,
Title = {Network representation learning: an overview},
Author = {TU, Cunchao and YANG, Cheng and LIU, Zhiyuan and SUN, Maosong},
Journal = {SCIENTIA SINICA Informationis},
Volume = {47},
Number = {8},
Pages = {980--996},
Year = {2017}
}
@inproceedings{ou2016asymmetric,
title = {Asymmetric transitivity preserving graph embedding},
author = {Ou, Mingdong and Cui, Peng and Pei, Jian and Zhang, Ziwei and Zhu, Wenwu},
booktitle = {Proceedings of the 22nd ACM SIGKDD},
pages = {1105--1114},
year = {2016},
organization = {ACM}
}
@inproceedings{belkin2002laplacian,
title = {Laplacian eigenmaps and spectral techniques for embedding and clustering},
author = {Belkin, Mikhail and Niyogi, Partha},
booktitle = {Advances in neural information processing systems},
pages = {585--591},
year = {2002}
}
@inproceedings{ahmed2013distributed,
title = {Distributed large-scale natural graph factorization},
author = {Ahmed, Amr and Shervashidze, Nino and Narayanamurthy, Shravan and Josifovski, Vanja and Smola, Alexander J},
booktitle = {Proceedings of the 22nd international conference on World Wide Web},
pages = {37--48},
year = {2013},
organization = {ACM}
}
@inproceedings{wang2016structural,
title = {Structural deep network embedding},
author = {Wang, Daixin and Cui, Peng and Zhu, Wenwu},
booktitle = {Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining},
pages = {1225--1234},
year = {2016},
organization = {ACM}
}
OpenSKL project aims to harness the power of both structured knowledge and natural languages via representation learning. All sub-projects of OpenSKL, under the categories of Algorithm, Resource and Application, are as follows.