Closed porscheme closed 1 year ago
should be like this, @Nicole00 could you help confirm this will work? if so, I could prepare pr for examples in conf file.
data: {
# data source. optional of nebula,nebula-ngql,csv,json
source: nebula-ngql
...
nebula: {
read: {
metaAddress: "127.0.0.1:9559"
graphAddress: "127.0.0.1:9669"
space: basketballplayer
labels: ["follow"]
weightCols: ["degree"]
ngql: "MATCH ()-[e:follow]->() RETURN e LIMIT 100000"
}
Thanks @wey-gu for the quick reply. It looks like nebula-algorithm doesn't work with string VID, can you confirm? And then I see this, how I convert our string VID to integer using algorithm interface?
For non-integer String data, it is recommended to use the algorithm interface. You can use the dense_rank function of SparkSQL to encode the data as the Long type instead of the String type.
Actually, it now supports to do the numerical vid generation and auto-mapping, just add encodeId:true
to the algo config, see https://github.com/vesoft-inc/nebula-algorithm/pull/68
You mean like below?
algorithm: {
executeAlgo: node2vec
node2vec:{
encodeId:true
maxIter: 5,
lr: 0.025,
dataNumPartition: 15,
modelNumPartition: 10,
dim: 9,
window: 2,
walkLength: 4,
numWalks: 10,
p: 05,
q: 0.5,
directed: false,
degree: 2,
embSeparate: ",",
modelPath: "/mnt/data/sparkdata/word2vec"
}
}
You mean like below?
algorithm: { executeAlgo: node2vec node2vec:{ encodeId:true maxIter: 5, lr: 0.025, dataNumPartition: 15, modelNumPartition: 10, dim: 9, window: 2, walkLength: 4, numWalks: 10, p: 05, q: 0.5, directed: false, degree: 2, embSeparate: ",", modelPath: "/mnt/data/sparkdata/word2vec" } }
Yes
Yes
I'm getting this error, not sure why? Below "0033af94-95f2-ec6d-ac72-f75f4d00622a" is a VID
{"level":"WARN","timestamp":"2023-03-22 04:43:17,806","thread":"main","message":"The jar local:///mnt/spark/work/nebula-algorithm-3.0-SNAPSHOT.jar has been added already. Overwriting of added jars is not supported in the current version."}
{"level":"WARN","timestamp":"2023-03-22 04:43:18,145","thread":"main","message":"returnCols is empty and your result will contain all properties for HAS_CONDITION"}
{"level":"WARN","timestamp":"2023-03-22 04:43:20,948","thread":"Executor task launch worker for task 0","message":"Putting block rdd_6_0 failed due to exception java.lang.NumberFormatException: For input string: "0033af94-95f2-ec6d-ac72-f75f4d00622a"."}
{"level":"WARN","timestamp":"2023-03-22 04:43:20,949","thread":"Executor task launch worker for task 0","message":"Block rdd_6_0 could not be removed as it was not found on disk or in memory"}
{"level":"ERROR","timestamp":"2023-03-22 04:43:20,959","thread":"Executor task launch worker for task 0","message":"Exception in task 0.0 in stage 0.0 (TID 0)"}
java.lang.NumberFormatException: For input string: "0033af94-95f2-ec6d-ac72-f75f4d00622a"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:277)
at scala.collection.immutable.StringOps.toLong(StringOps.scala:29)
at com.vesoft.nebula.algorithm.utils.NebulaUtil$$anonfun$1.apply(NebulaUtil.scala:29)
at com.vesoft.nebula.algorithm.utils.NebulaUtil$$anonfun$1.apply(NebulaUtil.scala:25)
at org.apache.spark.sql.execution.MapElementsExec$$anonfun$7$$anonfun$apply$1.apply(objects.scala:236)
at org.apache.spark.sql.execution.MapElementsExec$$anonfun$7$$anonfun$apply$1.apply(objects.scala:236)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:107)
at org.apache.spark.graphx.EdgeRDD$$anonfun$1.apply(EdgeRDD.scala:105)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:875)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:359)
at org.apache.spark.rdd.RDD$$anonfun$7.apply(RDD.scala:357)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1165)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:357)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:308)
at org.apache.spark.graphx.EdgeRDD.compute(EdgeRDD.scala:50)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
@Nicole00 I think the encodeId:true
for the main
entry of nebula-algorithm is supported, or it's actually not?
And @porscheme you are using the latest version of nebula-algo, right?
And @porscheme you are using the latest version of nebula-algo, right?
I cloned https://github.com/vesoft-inc/nebula-algorithm few hours ago. Therefore, I'm using latest.
oh, now I know, the node2vec is not yet supported for the encodeId
, you have to do it yourself to map vid to int for now.
@porscheme Hi, same to the previous issue you created, this issue has been closed due to a lack of updates for a long time. If you have any updates, it's OK to reopen it.
Again, thanks a lot for your contribution anyway 😊
General Question
Hi @wey-gu
Per comment on the application.conf file, data source can be nebula-ngql, can you please provide a sample? I want to try this feature.
Thanks
Below is an extract from the application.conf file