linkedin / dr-elephant

Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Apache License 2.0
1.35k stars 858 forks source link

The percent of resources wasted was negative for yarn-cluster spark streaming job #180

Open yfluo opened 7 years ago

yfluo commented 7 years ago

12-29-2016 13:11:46 INFO [Thread-6] com.linkedin.drelephant.ElephantRunner : Job queue size is 1 12-29-2016 13:11:46 INFO [dr-el-executor-thread-2] com.linkedin.drelephant.ElephantRunner : Analyzing SPARK application_1482973897405_0006 12-29-2016 13:11:46 INFO [dr-el-executor-thread-2] com.linkedin.drelephant.spark.fetchers.SparkFetcher : Fetching data for application_1482973897405_0006 12-29-2016 13:11:46 INFO [ForkJoinPool-1-worker-7] com.linkedin.drelephant.spark.fetchers.SparkRestClient : calling REST API at http://{$ip}:18080/api/v1/applications/application_1482973897405_0006 12-29-2016 13:11:46 INFO [ForkJoinPool-1-worker-7] com.linkedin.drelephant.spark.fetchers.SparkLogClient : looking for logs at webhdfs://${ip}:50070/user/spark/eventLog/application_1482973897405_0006_1.snappy 12-29-2016 13:11:46 ERROR [dr-el-executor-thread-2] com.linkedin.drelephant.ElephantRunner : ERROR executing DML bindLog[] error[Data truncation: Out of range value for column 'resource_wasted' at row 1] 12-29-2016 13:11:46 ERROR [dr-el-executor-thread-2] com.linkedin.drelephant.ElephantRunner : javax.persistence.PersistenceException: ERROR executing DML bindLog[] error[Data truncation: Out of range value for column 'resource_wasted' at row 1] at com.avaje.ebeaninternal.server.persist.dml.DmlBeanPersister.execute(DmlBeanPersister.java:97) at com.avaje.ebeaninternal.server.persist.dml.DmlBeanPersister.insert(DmlBeanPersister.java:57) at com.avaje.ebeaninternal.server.persist.DefaultPersistExecute.executeInsertBean(DefaultPersistExecute.java:66) at com.avaje.ebeaninternal.server.core.PersistRequestBean.executeNow(PersistRequestBean.java:448) at com.avaje.ebeaninternal.server.core.PersistRequestBean.executeOrQueue(PersistRequestBean.java:478) at com.avaje.ebeaninternal.server.persist.DefaultPersister.insert(DefaultPersister.java:335) at com.avaje.ebeaninternal.server.persist.DefaultPersister.saveEnhanced(DefaultPersister.java:310) at com.avaje.ebeaninternal.server.persist.DefaultPersister.saveRecurse(DefaultPersister.java:280) at com.avaje.ebeaninternal.server.persist.DefaultPersister.save(DefaultPersister.java:248) at com.avaje.ebeaninternal.server.core.DefaultServer.save(DefaultServer.java:1568) at com.avaje.ebeaninternal.server.core.DefaultServer.save(DefaultServer.java:1558) at com.avaje.ebean.Ebean.save(Ebean.java:453) at play.db.ebean.Model.save(Model.java:91) at com.linkedin.drelephant.ElephantRunner$ExecutorJob.run(ElephantRunner.java:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: com.mysql.jdbc.MysqlDataTruncation: Data truncation: Out of range value for column 'resource_wasted' at row 1 at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3845) at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3783) at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2447) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2594) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2545) at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1901) at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2113) at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2049) at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2034) at com.jolbox.bonecp.PreparedStatementHandle.executeUpdate(PreparedStatementHandle.java:205) at com.avaje.ebeaninternal.server.type.DataBind.executeUpdate(DataBind.java:55) at com.avaje.ebeaninternal.server.persist.dml.InsertHandler.execute(InsertHandler.java:134) at com.avaje.ebeaninternal.server.persist.dml.DmlBeanPersister.execute(DmlBeanPersister.java:86) ... 18 more

bretlowery commented 7 years ago

I had this issue as well. Worked around it by modifying ElephantRunner.java Run() as follows. Change these two lines:

          AppResult result = _analyticJob.getAnalysis();
          result.save();

to:

          AppResult result = _analyticJob.getAnalysis();
          if (result.resourceUsed < 0) {
            result.resourceUsed = 0;
          }
          if (result.resourceWasted < 0) {
            result.resourceWasted = 0;
          }
          if (result.totalDelay < 0) {
            result.totalDelay = 0;
          }
          result.save();