linkedin / dr-elephant

Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Apache License 2.0
1.35k stars 859 forks source link

How to delete recodes from the database? #694

Closed ProbShin closed 4 years ago

ProbShin commented 4 years ago

Dear all, I wonder if there is some way to delete or manipulate the records in the database form the Dr-elephant side.

I mean for example if we found the database takes too much space. And there are some old/certain job records are useless. Is there any official way to delete those records from the database. Or what did you do, if you found the database grows too large?

My guess is that we can remove some rows manually from some tables like Yarn_App_Result, Yarn_App_Heuristic_Result or Yarn_App_Heuristic_Result_Detailswith username and password. In that case, it may depend on the operators. Kind of feel less robust. Does the dr-elephant provide official functions or mechanisms or even some settings to automatically do it?

Any idea is appreciated, Thanks.

ShubhamGupta29 commented 4 years ago

Hi @ProbShin , Unfortunately currently there is no official way to automatically purge the records from database. Currently the delete process is manual. I wrote a python Script for the purge purpose which has basic MySql statements to delete some 10k old apps from Yarn_app_result after deleting all the associated rows from the table which are linked to Yarn_app_result from Foreign keys. As of now I don't have access to that script, but will add that script later in the comment if you are interested.