qubole / sparklens

Qubole Sparklens tool for performance tuning Apache Spark
http://sparklens.qubole.com
Apache License 2.0
566 stars 138 forks source link

StageSkewAnalyzer: Arithmetic Exception: division by zero #24

Open enaggar opened 6 years ago

enaggar commented 6 years ago

I'm receiving this error running sparklens on spark history file

Failed in Analyzer StageSkewAnalyzer
java.lang.ArithmeticException: / by zero
        at com.qubole.sparklens.analyzer.StageSkewAnalyzer$$anonfun$computePerStageEfficiencyStatistics$3.apply(StageSkewAnalyzer.scala:109)
        at com.qubole.sparklens.analyzer.StageSkewAnalyzer$$anonfun$computePerStageEfficiencyStatistics$3.apply(StageSkewAnalyzer.scala:90)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at com.qubole.sparklens.analyzer.StageSkewAnalyzer.computePerStageEfficiencyStatistics(StageSkewAnalyzer.scala:90)
        at com.qubole.sparklens.analyzer.StageSkewAnalyzer.analyze(StageSkewAnalyzer.scala:33)
        at com.qubole.sparklens.analyzer.AppAnalyzer$class.analyze(AppAnalyzer.scala:32)
        at com.qubole.sparklens.analyzer.StageSkewAnalyzer.analyze(StageSkewAnalyzer.scala:27)
        at com.qubole.sparklens.analyzer.AppAnalyzer$$anonfun$startAnalyzers$1.apply(AppAnalyzer.scala:91)
        at com.qubole.sparklens.analyzer.AppAnalyzer$$anonfun$startAnalyzers$1.apply(AppAnalyzer.scala:89)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
        at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
        at com.qubole.sparklens.analyzer.AppAnalyzer$.startAnalyzers(AppAnalyzer.scala:89)
        at com.qubole.sparklens.QuboleJobListener.onApplicationEnd(QuboleJobListener.scala:168)
        at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:57)
        at org.apache.spark.scheduler.ReplayListenerBus.doPostEvent(ReplayListenerBus.scala:35)
        at org.apache.spark.scheduler.ReplayListenerBus.doPostEvent(ReplayListenerBus.scala:35)
        at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
        at org.apache.spark.scheduler.ReplayListenerBus.postToAll(ReplayListenerBus.scala:35)
        at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:85)
        at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at com.qubole.sparklens.app.EventHistoryReporter.<init>(EventHistoryReporter.scala:38)
        at com.qubole.sparklens.app.ReporterApp$.parseInput(ReporterApp.scala:54)
        at com.qubole.sparklens.app.ReporterApp$.delayedEndpoint$com$qubole$sparklens$app$ReporterApp$1(ReporterApp.scala:27)
        at com.qubole.sparklens.app.ReporterApp$delayedInit$body.apply(ReporterApp.scala:20)
        at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
        at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
        at scala.App$$anonfun$main$1.apply(App.scala:76)
        at scala.App$$anonfun$main$1.apply(App.scala:76)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
        at scala.App$class.main(App.scala:76)
        at com.qubole.sparklens.app.ReporterApp$.main(ReporterApp.scala:20)
        at com.qubole.sparklens.app.ReporterApp.main(ReporterApp.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

In the output I can see total number of cores available = 10, and total number of executors = 11, what could be the cause of this? This leads to the executorCores variables to be equal to zero, which leads to the issue above.

iamrohit commented 5 years ago

Thanks for raising this issue. I will check and revert back shortly. Are you using dynamic allocation / autoscaling of executors?

kashif110 commented 5 years ago

Getting the same error when runing on Databricks notebook

emschimmel commented 5 years ago

EfficiencyStatisticsAnalyzer, StageSkewAnalyzer both throw this error in a jupyter notebook and it seems to have the same cause. AppContext.getMaxConcurrent, the maxConcurrent never get's higher that 0 in those cases. We don't use dynamic memory allocation of the executors.

emschimmel commented 5 years ago

@iamrohit Thanks for fixing this ;)