Closed porscheme closed 1 year ago
Will look into this and come with an example ;)
Dear @porscheme
Today I got the bandwidth to run node2vec for you: as an example
What version spark/scala should I using to run nebula-algorithm?
Can you provide spark 3.0.0 compatible version?
What version spark/scala should I using to run nebula-algorithm?
Can you provide spark 3.0.0 compatible version?
For now, it's 2.4.x only as documented, could you possibly use 2.4.x first?
I noticed nebula-exchange supported 3.0.0 with, but the equivalent work is not yet planned in nebula-algorithm, but i created an issue for it just now.
We normally use spark 3.2.1 but downgraded for Nebula to spark 2.4.6 and scala 2.11.12 Getting below error
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(
at java.base/
at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(
at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(
at java.base/java.lang.ClassLoader.loadClass(
It's blocking us making any progress; can you expedite support to spark 3?
It's blocking us making any progress; can you expedite support to spark 3?
@Nicole00 could you help point directions on why java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
encountered in spark 2.4.6 and scala 2.11.12, please?
@sunkararp Before @Nicole00 could help look into it, maybe you could refer to my nebula-up playground environment to see the differences?
after running curl -fsSL | bash -s -- v3 spark
in a machine with docker, you will have a nebula graph + spark 2.4.
then ~/.nebula-up/
will run page rank in the spark container, you could enter the spark container with docker exec -it spark_master_1 bash
to check its difference from yours 2.4.6?
I'm finally able to run in spark 2.4.6, Scala 2.11.12 and OpenJDK 64-Bit 1.8.0_252.
But getting java.lang.NullPointerException
Can you please look into this ASAP?
Below is my spark-submit
spark-submit --master "spark://" --conf spark.driver.extraClassPath=/home/jovyan/* --conf spark.executor.extraClassPath=/home/jovyan/* --conf spark.executor.instances=3 --conf spark.executor.memory=16G --conf spark.driver.maxResultSize=10G --conf --class com.vesoft.nebula.algorithm.Main --packages "com.vesoft:nebula-spark-connector:3.0.0,org.apache.spark:spark-core_2.11:2.4.4,org.apache.spark:spark-sql_2.11:2.4.4,com.github.scopt:scopt_2.11:3.7.1,com.typesafe:config:1.4.0,org.apache.spark:spark-mllib_2.11:2.4.4" --deploy-mode client nebula-algorithm-3.0.0.jar -p dev.algorithm.conf
below is my conf file
spark: {
app: {
name: My Graph Algorithm 1.0
data: {
source: nebula
sink: nebula
hasWeight: false
nebula: {
read: {
graphAddress: ""
metaAddress: ""
space: StudentCentral
graphAddress: ""
metaAddress: ""
algorithm: {
executeAlgo: node2vec
maxIter: 10,
lr: 0.025,
dataNumPartition: 10,
modelNumPartition: 10,
dim: 10,
window: 3,
walkLength: 1,
numWalks: 3,
p: 1.0,
q: 1.0,
directed: true,
degree: 30,
embSeparate: ",",
modelPath: "hdfs://namenode:9000/model"
Great to see your explorations and results 👍🏻, sorry I couldn't help you on them.
or?Implementation wasn't using spark worker nodes, was it a known issue?
- could you please help look into this, @Nicole00 ?
For large data, we are getting java.lang.OutOfMemoryError: GC overhead limit exceeded exception, do you think it's due to mem leak or this physically consumed your cluster's memory?
This implementation works for small dataset
For large dataset, you need huge amount of memory to process. Also, it doesn't use spark capabilities
Do you have any modifications to huge dataset? Any Pregel based solutions?
As the subject says can we have Node2Vec example? @wey-gu