Closed porscheme closed 1 year ago
Will look into this and come with an example ;)
Dear @porscheme
Today I got the bandwidth to run node2vec for you: https://gist.github.com/wey-gu/53e35bc2da571a919f4f0c248c5dd9fc as an example
What version spark/scala should I using to run nebula-algorithm?
Can you provide spark 3.0.0 compatible version?
What version spark/scala should I using to run nebula-algorithm?
Can you provide spark 3.0.0 compatible version?
For now, it's 2.4.x only as documented, could you possibly use 2.4.x first?
I noticed nebula-exchange supported 3.0.0 with https://github.com/vesoft-inc/nebula-exchange/pull/41, but the equivalent work is not yet planned in nebula-algorithm, but i created an issue for it just now.
We normally use spark 3.2.1 but downgraded for Nebula to spark 2.4.6 and scala 2.11.12 Getting below error
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800)
at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
$NebulaDataFrameReader.loadEdgesToDF(package.scala:146)
It's blocking us making any progress; can you expedite support to spark 3?
It's blocking us making any progress; can you expedite support to spark 3?
@Nicole00 could you help point directions on why java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
encountered in spark 2.4.6 and scala 2.11.12, please?
@sunkararp Before @Nicole00 could help look into it, maybe you could refer to my nebula-up playground environment to see the differences?
https://github.com/wey-gu/nebula-up/
after running curl -fsSL nebula-up.siwei.io/all-in-one.sh | bash -s -- v3 spark
in a machine with docker, you will have a nebula graph + spark 2.4.
then ~/.nebula-up/nebula-algo-pagerank-example.sh
will run page rank in the spark container, you could enter the spark container with docker exec -it spark_master_1 bash
to check its difference from yours 2.4.6?
I'm finally able to run in spark 2.4.6, Scala 2.11.12 and OpenJDK 64-Bit 1.8.0_252.
But getting java.lang.NullPointerException
Can you please look into this ASAP?
Below is my spark-submit
spark-submit --master "spark://10.155.48.35:7077" --conf spark.driver.extraClassPath=/home/jovyan/* --conf spark.executor.extraClassPath=/home/jovyan/* --conf spark.executor.instances=3 --conf spark.executor.memory=16G --conf spark.driver.maxResultSize=10G --conf spark.driver.host=10.155.50.21 --class com.vesoft.nebula.algorithm.Main --packages "com.vesoft:nebula-spark-connector:3.0.0,org.apache.spark:spark-core_2.11:2.4.4,org.apache.spark:spark-sql_2.11:2.4.4,com.github.scopt:scopt_2.11:3.7.1,com.typesafe:config:1.4.0,org.apache.spark:spark-mllib_2.11:2.4.4" --deploy-mode client nebula-algorithm-3.0.0.jar -p dev.algorithm.conf
below is my conf file
{
spark: {
app: {
name: My Graph Algorithm 1.0
partitionNum:100
}
master:local
}
data: {
source: nebula
sink: nebula
hasWeight: false
}
nebula: {
read: {
graphAddress: "10.0.195.64:9669"
metaAddress: "10.0.213.158:9559"
space: StudentCentral
user:root
pswd:nebula
labels: ["STUDENT_HAS_CLASS_TCODE"]
}
write:{
graphAddress: "10.0.195.64:9669"
metaAddress: "10.0.213.158:9559"
user:root
pswd:nebula
space:StudentCentral
tag:Student
type:update
}
}
algorithm: {
executeAlgo: node2vec
node2vec:{
maxIter: 10,
lr: 0.025,
dataNumPartition: 10,
modelNumPartition: 10,
dim: 10,
window: 3,
walkLength: 1,
numWalks: 3,
p: 1.0,
q: 1.0,
directed: true,
degree: 30,
embSeparate: ",",
modelPath: "hdfs://namenode:9000/model"
}
}
}
Great to see your explorations and results 👍🏻, sorry I couldn't help you on them.
java.lang.NullPointerException
or?Implementation wasn't using spark worker nodes, was it a known issue?
- could you please help look into this, @Nicole00 ?
For large data, we are getting java.lang.OutOfMemoryError: GC overhead limit exceeded exception, do you think it's due to mem leak or this physically consumed your cluster's memory?
This implementation works for small dataset
For large dataset, you need huge amount of memory to process. Also, it doesn't use spark capabilities
Do you have any modifications to huge dataset? Any Pregel based solutions?
As the subject says can we have Node2Vec example? @wey-gu