English | 中文
nebula-algorithm is a Spark Application based on GraphX with the following Algorithm provided for now:
Name | Use Case |
---|---|
PageRank | page ranking, important node digging |
Louvain | community digging, hierarchical clustering |
KCore | community detection, financial risk control |
LabelPropagation | community detection, consultation propagation, advertising recommendation |
Hanp | community detection, consultation propagation |
ConnectedComponent | community detection, isolated island detection |
StronglyConnectedComponent | community detection |
ShortestPath | path plan, network plan |
TriangleCount | network structure analysis |
GraphTriangleCount | network structure and tightness analysis |
BetweennessCentrality | important node digging, node influence calculation |
ClosenessCentrality | important node digging, node influence calculation |
DegreeStatic | graph structure analysis |
ClusteringCoefficient | recommended, telecom fraud analysis |
Jaccard | similarity calculation, recommendation |
BFS | sequence traversal, Shortest path plan |
DFS | sequence traversal, Shortest path plan |
Node2Vec | graph machine learning, recommendation |
You could submit the entire spark application or invoke algorithms in lib
library to apply graph algorithms for DataFrame.
Build Nebula Algorithm
$ git clone https://github.com/vesoft-inc/nebula-algorithm.git
$ cd nebula-algorithm
$ mvn clean package -Dgpg.skip -Dmaven.javadoc.skip=true -Dmaven.test.skip=true
After the above buiding process, the target file nebula-algorithm-3.0-SNAPSHOT.jar
will be placed under nebula-algorithm/target
.
Download from Maven repo
Alternatively, it could be downloaded from the following Maven repo:
Option 1: Submit nebula-algorithm package
Refer to the configuration example.
${SPARK_HOME}/bin/spark-submit --master <mode> --class com.vesoft.nebula.algorithm.Main nebula-algorithm-3.0—SNAPSHOT.jar -p application.conf
Option2: Call nebula-algorithm interface
Now there are 10+ algorithms provided in lib
from nebula-algorithm
, which could be invoked in a programming fashion as below:
pom.xml
.
<dependency>
<groupId>com.vesoft</groupId>
<artifactId>nebula-algorithm</artifactId>
<version>3.0.0</version>
</dependency>
PageRank
.
import com.vesoft.nebula.algorithm.config.{Configs, PRConfig, SparkConfig}
import org.apache.spark.sql.{DataFrame, SparkSession}
val spark = SparkSession.builder().master("local").getOrCreate() val data = spark.read.option("header", true).csv("src/test/resources/edge.csv") val prConfig = new PRConfig(5, 1.0) val prResult = PageRankAlgo.apply(spark, data, prConfig, false)
If your vertex ids are Strings, please set the algo config with encodeId = true. see [examples](https://github.com/vesoft-inc/nebula-algorithm/tree/master/example/src/main/scala/com/vesoft/nebula/algorithm/DegreeStaticExample.scala)
For examples of other algorithms, see [examples](https://github.com/vesoft-inc/nebula-algorithm/tree/master/example/src/main/scala/com/vesoft/nebula/algorithm)
> Note: The first column of DataFrame in the application represents the source vertices, the second represents the target vertices and the third represents edges' weight.
If you want to write the algorithm execution result into NebulaGraph(sink: nebula
), make sure there is corresponding property name in your tag defination.
Algorithm | property name | property type |
---|---|---|
pagerank | pagerank | double/string |
louvain | louvain | int/string |
kcore | kcore | int/string |
labelpropagation | lpa | int/string |
connectedcomponent | cc | int/string |
stronglyconnectedcomponent | scc | int/string |
betweenness | betweenness | double/string |
shortestpath | shortestpath | string |
degreestatic | degree,inDegree,outDegree | int/string |
trianglecount | trianglecount | int/string |
clusteringcoefficient | clustercoefficient | double/string |
closeness | closeness | double/string |
hanp | hanp | int/string |
bfs | bfs | string |
bfs | dfs | string |
jaccard | jaccard | string |
node2vec | node2vec | string |
NebulaGraph Algorithm Version | NebulaGraph Version | Spark Version |
---|---|---|
2.0.0 | 2.0.0, 2.0.1 | 2.4 |
2.1.0 | 2.0.0, 2.0.1 | 2.4 |
2.5.0 | 2.5.0, 2.5.1 | 2.4 |
2.6.0 | 2.6.0, 2.6.1 | 2.4 |
2.6.1 | 2.6.0, 2.6.1 | 2.4 |
2.6.2 | 2.6.0, 2.6.1 | 2.4 |
3.0.0, 3.1.x | 3.0.x, 3.1.x, 3.2.x, 3.3.x | 2.4 |
3.0-SNAPSHOT | nightly | 2.4 |
Nebula Algorithm is open source, you are more than welcomed to contribute in the following ways: