vesoft-inc / nebula-algorithm

Nebula-Algorithm is a Spark Application based on GraphX, which enables state of art Graph Algorithms to run on top of NebulaGraph and write back results to NebulaGraph.
71 stars 39 forks source link

Spark 3.0 support #50

Closed wey-gu closed 6 months ago

wey-gu commented 2 years ago

like https://github.com/vesoft-inc/nebula-exchange/pull/41

update: 2023-April, spark connector supports spark 3.0 now.

Nicole00 commented 2 years ago

it depends on spark connector with spark 3.0, not implemented but in the schedule.

xiajingchun commented 1 year ago

@Nicole00 I noticed there's already a pull request of connector supporting spark 3.0. When that one is done, any further work here in order to run algos in spark 3?

porscheme commented 1 year ago

Any update on this?

We cannot use nebula-algorithm since our Spark-Operator framework is spark 3.0.

Nicole00 commented 1 year ago

Any update on this?

We cannot use nebula-algorithm since our Spark-Operator framework is spark 3.0.

At present, you can do this temporarily: pull branch and execute maven install for nebula-spark-connector_3.0 , and update the spark connector version referenced in algorithm

meet-alfie commented 10 months ago

Any update on this?

We cannot use nebula-algorithm since our Spark-Operator framework is spark 3.0.

I also encountered the same problem。The spark version used by our platform is 3.x (3.2.2)。 My algorithm does not use nebula data directly, but uses the results of business query nebula. It can be thought of as only data containing vertices, end points, and weights. I only extracted the source code of nebula and nebula-spark-connector used in the algorithm to my project, and relied on my own, which can run the algorithm like pagerank normally.

My main modifications are as follows:

  1. source file
    
    ├── base
    │   └── client
    │       ├── meta_data
    │       │   ├── FieldMetaData.java
    │       │   └── FieldValueMetaData.java
    │       ├── protocol
    │       │   ├── ShortStack.java
    │       │   ├── TCompactProtocol.java
    │       │   ├── TException.java
    │       │   ├── TField.java
    │       │   ├── TList.java
    │       │   ├── TMap.java
    │       │   ├── TMessage.java
    │       │   ├── TProtocol.java
    │       │   ├── TProtocolException.java
    │       │   ├── TProtocolFactory.java
    │       │   ├── TSet.java
    │       │   ├── TStruct.java
    │       │   └── TTransportException.java
    │       ├── schema
    │       │   ├── IScheme.java
    │       │   ├── SchemeFactory.java
    │       │   └── StandardScheme.java
    │       ├── thrift
    │       │   └── TBase.java
    │       └── transport
    │           ├── TException.java
    │           ├── TTransport.java
    │           └── TTransportException.java
    ├── config
    │   ├── AlgoConfig.scala
    │   └── SparkConfigEntry.scala
    ├── examples
    │   └── PageRankExample.scala
    ├── lib
    │   └── PageRankAlgo.scala
    ├── reader
    │   └── ReadData.scala
    └── utils
    ├── DecodeUtil.scala
    └── NebulaUtil.scala

13 directories, 29 files


2. pom.xml
    <properties>
       <maven.compiler.source>8</maven.compiler.source>
       <maven.compiler.target>8</maven.compiler.target>
       <scala.version>2.12</scala.version>
       <spark.version>3.2.2</spark.version>
       <lombok.version>1.18.28</lombok.version>
       <config.version>1.4.0</config.version>
       <scopt.version>3.7.1</scopt.version>
     </properties>

     <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${scala.version}</artifactId>
        <version>${spark.version}</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_${scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql-kafka-0-10_${scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-graphx_${scala.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <dependency>
        <groupId>com.typesafe</groupId>
        <artifactId>config</artifactId>
        <version>${config.version}</version>
    </dependency>
    <dependency>
        <groupId>com.github.scopt</groupId>
        <artifactId>scopt_${scala.version}</artifactId>
        <version>${scopt.version}</version>
    </dependency>

Just follow this method to add your algorithm source code and update your own dependencies
Hope it helps you
xin-hao-awx commented 9 months ago

Can we take this as a higher priority?

Nicole00 commented 6 months ago

https://github.com/vesoft-inc/nebula-algorithm/tree/spark3