This package includs 5 different SimRank implementations: DFS (depth-first search) MapReduce, naive MapReduce, delta MapReduce, matrix multiplication and PageRank-like Random Walk with Restart. You can choose different implementation through configuration.
This implementation is compatible with Spark 0.8.1+ version, you can compile using sbt assembly
,
before that please configure the correct Hadoop version in build.sbt
.
graph_generate.py
to generate random adjacency matrix, you can configure GRAPH_SIZE
(number of vertices), EDGE_SIZE
(number of edges) to control the matrix rank, this script will
serialize matrix to file../run simrank.SimRankDataPrepare
to generate data, it
should be noted that two parameters graphASize
and graphBSize
, which specifies the
vertices number of two sub-graphs in the bipartite graph, should be the same as step 1's generated
result.config/config.properties
and run by ./run simrank.SimRankImpl
.This implementation is open sourced under Apache License Version 2.0.