pferrel / template-scala-parallel-universal-recommendation

39 stars 89 forks source link

error java.lang.NegativeArraySizeException #15

Closed dipenpatel235 closed 8 years ago

dipenpatel235 commented 8 years ago

[root@localhost UniversalRecommendation]# pio build [INFO] [Console$] Using existing engine manifest JSON at /root/PredictionIO/UniversalRecommendation/manifest.json [INFO] [Console$] Using command '/root/PredictionIO/sbt/sbt' at the current working directory to build. [INFO] [Console$] If the path above is incorrect, this process will fail. [INFO] [Console$] Uber JAR disabled. Making sure lib/pio-assembly-0.9.5.jar is absent. [INFO] [Console$] Going to run: /root/PredictionIO/sbt/sbt package assemblyPackageDependency [INFO] [Console$] Build finished successfully. [INFO] [Console$] Looking for an engine... [INFO] [Console$] Found template-scala-parallel-universal-recommendation_2.10-0.2.3.jar [INFO] [Console$] Found template-scala-parallel-universal-recommendation-assembly-0.2.3-deps.jar [INFO] [RegisterEngine$] Registering engine aAMwBiqqgflKKwevIQMuNFoIrI7QyiK2 bf152c9df4a5e3789b0f276e3648122ea96bc395 [INFO] [Console$] Your engine is ready for training.

[root@localhost UniversalRecommendation]# pio train [INFO] [Console$] Using existing engine manifest JSON at /root/PredictionIO/UniversalRecommendation/manifest.json [INFO] [Runner$] Submission command: /root/PredictionIO/vendors/spark-1.5.1/bin/spark-submit --class io.prediction.workflow.CreateWorkflow --jars file:/root/PredictionIO/UniversalRecommendation/target/scala-2.10/template-scala-parallel-universal-recommendation_2.10-0.2.3.jar,file:/root/PredictionIO/UniversalRecommendation/target/scala-2.10/template-scala-parallel-universal-recommendation-assembly-0.2.3-deps.jar --files file:/root/PredictionIO/conf/log4j.properties,file:/root/PredictionIO/vendors/elasticsearch-1.4.4/config/elasticsearch.yml,file:/root/PredictionIO/vendors/hbase-1.0.0/conf/hbase-site.xml --driver-class-path /root/PredictionIO/conf:/root/PredictionIO/vendors/elasticsearch-1.4.4/config:/root/PredictionIO/vendors/hbase-1.0.0/conf file:/root/PredictionIO/lib/pio-assembly-0.9.5.jar --engine-id aAMwBiqqgflKKwevIQMuNFoIrI7QyiK2 --engine-version bf152c9df4a5e3789b0f276e3648122ea96bc395 --engine-variant file:/root/PredictionIO/UniversalRecommendation/engine.json --verbosity 0 --json-extractor Both --env PIO_STORAGE_SOURCES_HBASE_TYPE=hbase,PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/root/.pio_store,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost,PIO_STORAGE_SOURCES_HBASE_HOME=/root/PredictionIO/vendors/hbase-1.0.0,PIO_HOME=/root/PredictionIO,PIO_FS_ENGINESDIR=/root/.pio_store/engines,PIO_STORAGE_SOURCES_LOCALFS_PATH=/root/.pio_store/models,PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=/root/PredictionIO/vendors/elasticsearch-1.4.4,PIO_FS_TMPDIR=/root/.pio_store/tmp,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE,PIO_CONF_DIR=/root/PredictionIO/conf,PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300,PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs [INFO] [Engine] Extracting datasource params... [INFO] [WorkflowUtils$] No 'name' is found. Default empty String will be used. [INFO] [Engine] Datasource params: (,DataSourceParams(UniversalRecommendation,List(purchase, view))) [INFO] [Engine] Extracting preparator params... [INFO] [Engine] Preparator params: (,Empty) [INFO] [Engine] Extracting serving params... [INFO] [Engine] Serving params: (,Empty) [INFO] [Remoting] Starting remoting [INFO] [Remoting] Remoting started; listening on addresses :[akka.tcp://sparkDriver@142.4.210.144:59642] [WARN] [MetricsSystem] Using default name DAGScheduler for source because spark.app.id is not set. [INFO] [Engine$] EngineWorkflow.train [INFO] [Engine$] DataSource: org.template.DataSource@304c186f [INFO] [Engine$] Preparator: org.template.Preparator@496ed82e [INFO] [Engine$] AlgorithmList: List(org.template.URAlgorithm@6bd83def) [INFO] [Engine$] Data sanity check is on. [INFO] [Engine$] org.template.TrainingData does not support data sanity check. Skipping check. [INFO] [Engine$] org.template.PreparedData does not support data sanity check. Skipping check. [INFO] [URAlgorithm] Actions read now creating correlators [ERROR] [Executor] Exception in task 0.0 in stage 23.0 (TID 17) [WARN] [TaskSetManager] Lost task 0.0 in stage 23.0 (TID 17, localhost): java.lang.NegativeArraySizeException at org.apache.mahout.math.DenseVector.(DenseVector.java:57) at org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:78) at org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:77) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:706) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:706) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

[ERROR] [TaskSetManager] Task 0 in stage 23.0 failed 1 times; aborting job Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 23.0 failed 1 times, most recent failure: Lost task 0.0 in stage 23.0 (TID 17, localhost): java.lang.NegativeArraySizeException at org.apache.mahout.math.DenseVector.(DenseVector.java:57) at org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:78) at org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:77) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:706) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:706) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:567) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1822) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1942) at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1003) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108) at org.apache.spark.rdd.RDD.withScope(RDD.scala:306) at org.apache.spark.rdd.RDD.reduce(RDD.scala:985) at org.apache.mahout.sparkbindings.SparkEngine$.numNonZeroElementsPerColumn(SparkEngine.scala:86) at org.apache.mahout.math.drm.CheckpointedOps.numNonZeroElementsPerColumn(CheckpointedOps.scala:37) at org.apache.mahout.math.cf.SimilarityAnalysis$.sampleDownAndBinarize(SimilarityAnalysis.scala:286) at org.apache.mahout.math.cf.SimilarityAnalysis$.cooccurrences(SimilarityAnalysis.scala:66) at org.apache.mahout.math.cf.SimilarityAnalysis$.cooccurrencesIDSs(SimilarityAnalysis.scala:141) at org.template.URAlgorithm.calcAll(URAlgorithm.scala:143) at org.template.URAlgorithm.train(URAlgorithm.scala:117) at org.template.URAlgorithm.train(URAlgorithm.scala:102) at io.prediction.controller.P2LAlgorithm.trainBase(P2LAlgorithm.scala:46) at io.prediction.controller.Engine$$anonfun$18.apply(Engine.scala:688) at io.prediction.controller.Engine$$anonfun$18.apply(Engine.scala:688) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at io.prediction.controller.Engine$.train(Engine.scala:688) at io.prediction.controller.Engine.train(Engine.scala:174) at io.prediction.workflow.CoreWorkflow$.runTrain(CoreWorkflow.scala:65) at io.prediction.workflow.CreateWorkflow$.main(CreateWorkflow.scala:247) at io.prediction.workflow.CreateWorkflow.main(CreateWorkflow.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.NegativeArraySizeException at org.apache.mahout.math.DenseVector.(DenseVector.java:57) at org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:78) at org.apache.mahout.sparkbindings.SparkEngine$$anonfun$5.apply(SparkEngine.scala:77) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:706) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$17.apply(RDD.scala:706) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:297) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

pferrel commented 8 years ago

The root repo for the Universal Recommender is here: https://github.com/actionml/template-scala-parallel-universal-recommendation and the Google Group for support is here: https://groups.google.com/forum/#!forum/actionml

The java.lang.NegativeArraySizeException means you have no data for your primary event. This can happen because you haven't setup eventNames with the primary event named first in the array or because you have misspelled something. To answer more specifically I need to see your engine.json and a snippet of your input. Please respond on the group.