linkedin / dr-elephant

Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Apache License 2.0
1.36k stars 859 forks source link

Failed to Analyze spark 1.6 history application #366

Open nichoc opened 6 years ago

nichoc commented 6 years ago

We compiled dr-elephant (customSHSWork branch), and deployed it on our hadoop yarn cluster for test, we tried both REST api SparkFetcher and spark.fetchers.FSFetcher and failed to load recent jobs (mapreduce jobs can be successfully loaded). Look forward to any tips. Thanks!

here is the log:

04-11-2018 15:47:09 INFO  [ForkJoinPool-3-worker-57] com.linkedin.drelephant.spark.fetchers.SparkRestClient : creating SparkApplication by calling REST API at http://myhost:23764/api/v1/applications/application_1521785315204_12497/logs to get eventlogs
04-11-2018 15:47:09 INFO  [dr-el-executor-thread-0] com.linkedin.drelephant.spark.fetchers.SparkFetcher : Succeeded fetching data for application_1521785315204_12497
**04-11-2018 15:47:09 ERROR [dr-el-executor-thread-0] com.linkedin.drelephant.ElephantRunner : None.get
04-11-2018 15:47:09 ERROR [dr-el-executor-thread-0] com.linkedin.drelephant.ElephantRunner : java.util.NoSuchElementException: None.get
    at scala.None$.get(Option.scala:313)
    at scala.None$.get(Option.scala:311)
    at** com.linkedin.drelephant.spark.heuristics.StagesWithFailedTasksHeuristic$Evaluator$$anonfun$getErrorsSeverity$1.apply(StagesWithFailedTasksHeuristic.scala:83)
    at com.linkedin.drelephant.spark.heuristics.StagesWithFailedTasksHeuristic$Evaluator$$anonfun$getErrorsSeverity$1.apply(StagesWithFailedTasksHeuristic.scala:79)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at com.linkedin.drelephant.spark.heuristics.StagesWithFailedTasksHeuristic$Evaluator.getErrorsSeverity(StagesWithFailedTasksHeuristic.scala:79)
    at com.linkedin.drelephant.spark.heuristics.StagesWithFailedTasksHeuristic$Evaluator.x$1$lzycompute(StagesWithFailedTasksHeuristic.scala:143)
    at com.linkedin.drelephant.spark.heuristics.StagesWithFailedTasksHeuristic$Evaluator.x$1(StagesWithFailedTasksHeuristic.scala:143)
    at com.linkedin.drelephant.spark.heuristics.StagesWithFailedTasksHeuristic$Evaluator.stagesWithOOMError$lzycompute(StagesWithFailedTasksHeuristic.scala:143)
    at com.linkedin.drelephant.spark.heuristics.StagesWithFailedTasksHeuristic$Evaluator.stagesWithOOMError(StagesWithFailedTasksHeuristic.scala:143)
    at com.linkedin.drelephant.spark.heuristics.StagesWithFailedTasksHeuristic.apply(StagesWithFailedTasksHeuristic.scala:42)
    at com.linkedin.drelephant.spark.heuristics.StagesWithFailedTasksHeuristic.apply(StagesWithFailedTasksHeuristic.scala:31)
    at com.linkedin.drelephant.analysis.AnalyticJob.getAnalysis(AnalyticJob.java:265)
    at com.linkedin.drelephant.ElephantRunner$ExecutorJob.run(ElephantRunner.java:175)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
nichoc commented 6 years ago

looks like spark 1.6 history server cannot support StagesWithFailedTasksHeuristic, we remove this configuration in HeuristicConf.xml and FetcherConf.xml, then spark jobs are analyzed normally and show on web ui.