vesoft-inc / nebula-algorithm

Nebula-Algorithm is a Spark Application based on GraphX, which enables state of art Graph Algorithms to run on top of NebulaGraph and write back results to NebulaGraph.
71 stars 39 forks source link

Can we have Node2Vec example #49

Closed porscheme closed 1 year ago

porscheme commented 2 years ago

As the subject says can we have Node2Vec example? @wey-gu

wey-gu commented 2 years ago

Will look into this and come with an example ;)

wey-gu commented 2 years ago

Dear @porscheme

Today I got the bandwidth to run node2vec for you: https://gist.github.com/wey-gu/53e35bc2da571a919f4f0c248c5dd9fc as an example

sunkararp commented 2 years ago

What version spark/scala should I using to run nebula-algorithm?

Can you provide spark 3.0.0 compatible version?

wey-gu commented 2 years ago

What version spark/scala should I using to run nebula-algorithm?

Can you provide spark 3.0.0 compatible version?

For now, it's 2.4.x only as documented, could you possibly use 2.4.x first?

I noticed nebula-exchange supported 3.0.0 with https://github.com/vesoft-inc/nebula-exchange/pull/41, but the equivalent work is not yet planned in nebula-algorithm, but i created an issue for it just now.

sunkararp commented 2 years ago

We normally use spark 3.2.1 but downgraded for Nebula to spark 2.4.6 and scala 2.11.12 Getting below error

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
        at java.base/java.lang.ClassLoader.defineClass1(Native Method)
        at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
        at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
        at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800)
        at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698)
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
         $NebulaDataFrameReader.loadEdgesToDF(package.scala:146)
sunkararp commented 2 years ago

It's blocking us making any progress; can you expedite support to spark 3?

wey-gu commented 2 years ago

It's blocking us making any progress; can you expedite support to spark 3?

@Nicole00 could you help point directions on why java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport encountered in spark 2.4.6 and scala 2.11.12, please?

wey-gu commented 2 years ago

@sunkararp Before @Nicole00 could help look into it, maybe you could refer to my nebula-up playground environment to see the differences?

https://github.com/wey-gu/nebula-up/

after running curl -fsSL nebula-up.siwei.io/all-in-one.sh | bash -s -- v3 spark in a machine with docker, you will have a nebula graph + spark 2.4.

then ~/.nebula-up/nebula-algo-pagerank-example.sh will run page rank in the spark container, you could enter the spark container with docker exec -it spark_master_1 bash to check its difference from yours 2.4.6?

sunkararp commented 2 years ago
sunkararp commented 2 years ago

I'm finally able to run in spark 2.4.6, Scala 2.11.12 and OpenJDK 64-Bit 1.8.0_252.

But getting java.lang.NullPointerException

Can you please look into this ASAP?

Below is my spark-submit

spark-submit --master "spark://10.155.48.35:7077" --conf spark.driver.extraClassPath=/home/jovyan/* --conf spark.executor.extraClassPath=/home/jovyan/* --conf spark.executor.instances=3 --conf spark.executor.memory=16G --conf spark.driver.maxResultSize=10G --conf spark.driver.host=10.155.50.21 --class com.vesoft.nebula.algorithm.Main --packages "com.vesoft:nebula-spark-connector:3.0.0,org.apache.spark:spark-core_2.11:2.4.4,org.apache.spark:spark-sql_2.11:2.4.4,com.github.scopt:scopt_2.11:3.7.1,com.typesafe:config:1.4.0,org.apache.spark:spark-mllib_2.11:2.4.4" --deploy-mode client nebula-algorithm-3.0.0.jar -p dev.algorithm.conf

below is my conf file

{
  spark: {
    app: {
        name: My Graph Algorithm 1.0
        partitionNum:100
    }
    master:local
  }

  data: {
    source: nebula
    sink: nebula
    hasWeight: false
  }

  nebula: {
    read: {
        graphAddress: "10.0.195.64:9669"
        metaAddress: "10.0.213.158:9559"
        space: StudentCentral
        user:root
        pswd:nebula        
        labels: ["STUDENT_HAS_CLASS_TCODE"]
    }

    write:{
        graphAddress: "10.0.195.64:9669"
        metaAddress: "10.0.213.158:9559"
        user:root
        pswd:nebula
        space:StudentCentral
        tag:Student
        type:update
    }
  }  

  algorithm: {
    executeAlgo: node2vec
   node2vec:{
       maxIter: 10,
       lr: 0.025,
       dataNumPartition: 10,
       modelNumPartition: 10,
       dim: 10,
       window: 3,
       walkLength: 1,
       numWalks: 3,
       p: 1.0,
       q: 1.0,
       directed: true,
       degree: 30,
       embSeparate: ",",
       modelPath: "hdfs://namenode:9000/model"
    }
  }
}
sunkararp commented 2 years ago
wey-gu commented 2 years ago

Great to see your explorations and results 👍🏻, sorry I couldn't help you on them.

sunkararp commented 2 years ago