linkedin / dr-elephant

Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Apache License 2.0
1.35k stars 859 forks source link

Cannot add or update a child row: a foreign key constraint fails #609

Open omicron8 opened 5 years ago

omicron8 commented 5 years ago

I have a lot of errors in /logs/elephant/dr_elephant.log and therefore I have no processed jobs in web UI

07-08-2019 17:00:41 WARN  [dr-el-executor-thread-1] com.linkedin.drelephant.ElephantRunner : Add analytic job id [application_1562254467554_23925] into the retry list.
07-08-2019 17:00:41 INFO  [dr-el-executor-thread-1] com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : Retry queue size is 11
07-08-2019 17:00:41 INFO  [dr-el-executor-thread-1] com.linkedin.drelephant.ElephantRunner : Analyzing MAPREDUCE application_1562254467554_23937
07-08-2019 17:00:41 INFO  [dr-el-executor-thread-1] com.linkedin.drelephant.util.Utils : Truncating [FETCHER] Content parser, INFO_RELIABLE,RSS,URLS_DISCOVERY,SITEMAPS_DISCOVERY,NAILS batches,  (AutoPut m
ode) to 100 characters for application_1562254467554_23937
07-08-2019 17:00:41 INFO  [dr-el-executor-thread-1] com.linkedin.drelephant.util.InfoExtractor : No Scheduler found for appid: application_1562254467554_23937
07-08-2019 17:00:41 ERROR [dr-el-executor-thread-1] com.linkedin.drelephant.ElephantRunner : ERROR executing DML bindLog[] error[Cannot add or update a child row: a foreign key constraint fails (`drele_pr
od`.`yarn_app_heuristic_result`, CONSTRAINT `yarn_app_heuristic_result_f1` FOREIGN KEY (`yarn_app_result_id`) REFERENCES `yarn_app_result` (`id`))]
07-08-2019 17:00:41 ERROR [dr-el-executor-thread-1] com.linkedin.drelephant.ElephantRunner : javax.persistence.PersistenceException: ERROR executing DML bindLog[] error[Cannot add or update a child row: a
 foreign key constraint fails (`drele_prod`.`yarn_app_heuristic_result`, CONSTRAINT `yarn_app_heuristic_result_f1` FOREIGN KEY (`yarn_app_result_id`) REFERENCES `yarn_app_result` (`id`))]
        at com.avaje.ebeaninternal.server.persist.dml.DmlBeanPersister.execute(DmlBeanPersister.java:97)
        at com.avaje.ebeaninternal.server.persist.dml.DmlBeanPersister.insert(DmlBeanPersister.java:57)
        at com.avaje.ebeaninternal.server.persist.DefaultPersistExecute.executeInsertBean(DefaultPersistExecute.java:66)
        at com.avaje.ebeaninternal.server.core.PersistRequestBean.executeNow(PersistRequestBean.java:448)
        at com.avaje.ebeaninternal.server.core.PersistRequestBean.executeOrQueue(PersistRequestBean.java:478)
        at com.avaje.ebeaninternal.server.persist.DefaultPersister.insert(DefaultPersister.java:335)
        at com.avaje.ebeaninternal.server.persist.DefaultPersister.saveEnhanced(DefaultPersister.java:310)
        at com.avaje.ebeaninternal.server.persist.DefaultPersister.saveRecurse(DefaultPersister.java:280)
        at com.avaje.ebeaninternal.server.persist.DefaultPersister.saveAssocManyDetails(DefaultPersister.java:851)
        at com.avaje.ebeaninternal.server.persist.DefaultPersister.saveMany(DefaultPersister.java:734)
        at com.avaje.ebeaninternal.server.persist.DefaultPersister.saveAssocMany(DefaultPersister.java:631)
        at com.avaje.ebeaninternal.server.persist.DefaultPersister.insert(DefaultPersister.java:339)
        at com.avaje.ebeaninternal.server.persist.DefaultPersister.saveEnhanced(DefaultPersister.java:310)
        at com.avaje.ebeaninternal.server.persist.DefaultPersister.saveRecurse(DefaultPersister.java:280)
        at com.avaje.ebeaninternal.server.persist.DefaultPersister.save(DefaultPersister.java:248)
        at com.avaje.ebeaninternal.server.core.DefaultServer.save(DefaultServer.java:1568)
        at com.avaje.ebeaninternal.server.core.DefaultServer.save(DefaultServer.java:1558)
        at com.avaje.ebean.Ebean.save(Ebean.java:453)
        at play.db.ebean.Model.save(Model.java:91)
        at com.linkedin.drelephant.ElephantRunner$ExecutorJob$1.run(ElephantRunner.java:399)
        at com.avaje.ebeaninternal.server.core.DefaultServer.execute(DefaultServer.java:699)
        at com.avaje.ebeaninternal.server.core.DefaultServer.execute(DefaultServer.java:693)
        at com.avaje.ebean.Ebean.execute(Ebean.java:1207)
        at com.linkedin.drelephant.ElephantRunner$ExecutorJob.run(ElephantRunner.java:397)
        at com.linkedin.drelephant.priorityexecutor.RunnableWithPriority$1.run(RunnableWithPriority.java:36)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Cannot add or update a child row: a foreign key constraint fails (`drele_prod`.`yarn_app_heuristic_result`, CONSTRAINT `yarn_app_heuristic_result_f1` FOREIGN KEY (`yarn_app_result_id`) REFERENCES `yarn_app_result` (`id`))
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at com.mysql.jdbc.Util.handleNewInstance(Util.java:400)
        at com.mysql.jdbc.Util.getInstance(Util.java:383)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:973)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3847)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3783)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2447)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2594)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2545)
        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1901)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2113)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2049)
        at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2034)
        at com.jolbox.bonecp.PreparedStatementHandle.executeUpdate(PreparedStatementHandle.java:205)
        at com.avaje.ebeaninternal.server.type.DataBind.executeUpdate(DataBind.java:55)
        at com.avaje.ebeaninternal.server.persist.dml.InsertHandler.execute(InsertHandler.java:134)
        at com.avaje.ebeaninternal.server.persist.dml.DmlBeanPersister.execute(DmlBeanPersister.java:86)
omicron8 commented 5 years ago

I commented out oozie scheduler in app-conf/SchedulerConf.xm and the error has disappeared. But data is not saved into db and hence still nothing in web UI. Any ideas?

ShubhamGupta29 commented 4 years ago

@omicron8 sorry for the late follow-up, can you let me know if you are still facing this issue?

omicron8 commented 4 years ago

@omicron8 sorry for the late follow-up, can you let me know if you are still facing this issue?

Yes, the problem still presents

ShubhamGupta29 commented 4 years ago

Are you getting any exceptions in the dr_elephant.log or dr.log or logs/application.log? Any type of application getting analyzed?

omicron8 commented 4 years ago

Nothing except the error above. We have only mr jobs. No one job has been analyzed.

ShubhamGupta29 commented 4 years ago

Are you able to log messages like Analysis of MAPREDUCE application application_xxxxxxxx took XYZ ms?

omicron8 commented 4 years ago

Are you able to log messages like Analysis of MAPREDUCE application application_xxxxxxxx took XYZ ms?

I did grep all logs and nothing found.

ShubhamGupta29 commented 4 years ago

Is logs/elephant/dr_elephant.log is getting populated(logs are written)? If yes, can you confirm if X is non-zero in Job queue size is **X** log statement? It would be helpful if you can share your log file with masking all the private details like RM Address etc.

omicron8 commented 4 years ago

Yes, it's being populated. Here is line with job queue size: 02-03-2020 16:41:08 INFO [Thread-10] com.linkedin.drelephant.ElephantRunner : Job queue size is 150726

ShubhamGupta29 commented 4 years ago

This shows that Dr.Elephant is able to fetch finished applications but they are not getting processed which can be confirmed if the Y in Second Retry queue size is Y is increasing.Try to look for log matching Drop the analytic job. Reason: reached the max retries for application id = [XYZ].

Also what type of jobs are you analyzing?

omicron8 commented 4 years ago

Yep, I see messages like that dr-elephant-2.1.7/logs/elephant/dr_elephant.log.2020-01-04:01-04-2020 04:58:48 ERROR [dr-el-executor-thread-0] com.linkedin.drelephant.ElephantRunner : Drop the analytic job. Reason: reached the max retries for application id = [application_1575530104012_328247].

We analyze mr2 jobs only.

ShubhamGupta29 commented 4 years ago

Can you provide some logs before and after the log you gave above?

omicron8 commented 4 years ago

dr_elephant.log.2020-01-04.txt

omicron8 commented 4 years ago

hey, any news?

omicron8 commented 4 years ago

Are you able to log messages like Analysis of MAPREDUCE application application_xxxxxxxx took XYZ ms?

Again. If I commented out oozie scheduler in app-conf/SchedulerConf.xml I can see messages like this 02-26-2020 15:19:33 INFO [dr-el-executor-thread-2] com.linkedin.drelephant.ElephantRunner : Analyzing MAPREDUCE application_1581418441444_240970

But database is still empty.

If I uncomment oozie scheduler in app-conf/SchedulerConf.xml I get the following error:

Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException: Cannot add or update a child row: a foreign key constraint fails (drele_prod.yarn_app_heuristic_result, CONSTRAINT yarn_app_heuristic_result_f1 FOREIGN KEY (yarn_app_result_id) REFERENCES yarn_app_result (id))

ShubhamGupta29 commented 4 years ago

Keep oozie scheduler in app-conf/SchedulerConf.xml commented out, and then attach the respective logs here, if your application is still not getting processed then there must be some error and need to find that out from logs.

omicron8 commented 4 years ago

Here is log for last hour dr_elephant.log

ShubhamGupta29 commented 4 years ago

@omicron8 I can't see any errors in the log provided, but seems like you have made few changes to the code. Can you provide the diff of changes you made?