joandre / MCL_spark

An implementation of Markov Clustering algorithm for Spark in Scala
MIT License
34 stars 13 forks source link

Found duplicate indices ? #19

Open mansanhg opened 6 years ago

mansanhg commented 6 years ago

Hello, I'm trying to use this code loading a different graph, however, this exception is thrown:

Job aborted due to stage failure: Task 15 in stage 21.0 failed 4 times, most recent failure: Lost task 15.3 in stage 21.0 (TID 4221, worker103.hathi.surfsara.nl, executor 12): java.lang.IllegalArgumentException: requirement failed: Found duplicate indices: 44457. at scala.Predef$.require(Predef.scala:224) at org.apache.spark.mllib.linalg.Vectors$$anonfun$sparse$1.apply$mcVI$sp(Vectors.scala:320) at org.apache.spark.mllib.linalg.Vectors$$anonfun$sparse$1.apply(Vectors.scala:319) at org.apache.spark.mllib.linalg.Vectors$$anonfun$sparse$1.apply(Vectors.scala:319) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.mllib.linalg.Vectors$.sparse(Vectors.scala:319) at org.apache.spark.mllib.clustering.MCLUtils$$anonfun$8.apply(MCLUtils.scala:195) at org.apache.spark.mllib.clustering.MCLUtils$$anonfun$8.apply(MCLUtils.scala:194) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:185) at scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1336) at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1012) at org.apache.spark.rdd.RDD$$anonfun$reduce$1$$anonfun$15.apply(RDD.scala:1010) at org.apache.spark.SparkContext$$anonfun$32.apply(SparkContext.scala:1987) at org.apache.spark.SparkContext$$anonfun$32.apply(SparkContext.scala:1987) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) `

joandre commented 6 years ago

Hello,

Can you provide your graph example ? Did you compile the last code version locally based on Spark 2.1.1 ?