linkedin / dr-elephant

Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Apache License 2.0
1.35k stars 859 forks source link

[TonyEF] [Enhancement]Tony exception classification #677

Closed pralabhkumar closed 4 years ago

pralabhkumar commented 4 years ago

PR is to classify Tony Exceptions into user provided categorization

Internal Tracking Jira: LIHADOOP-51761

Dr E UI image

Rest response

{ "workflow-exceptions":[ { "name":"cnn-infer", "type":"TONY", "id":"cnn-infer", "applications":[ { "name":"application_1583846963506_38267", "exceptionSummary":[ { "exceptionID":1, "exceptionTrackingURL":"", "exceptionName":"Job Diagnostics", "exceptionSource":"DRIVER", "exceptionStackTrace":"Job Diagnostics: \nSingle node training failed.." }, { "exceptionID":625396750, "exceptionTrackingURL":"", "exceptionName":"Traceback (most recent call last)", "exceptionSource":"DRIVER", "exceptionStackTrace":"tensorflow.python.framework.errors_impl.NotFoundError: hdfs://default/user/mbhambha/data/covid-articles-v1/offline-analysis; No such file or directory" } ], "exceptionLogSource":"", "exceptionClassification":"USER_ERROR/FILE_NOT_FOUND", "tasks":[

           ]
        }
     ],
     "status":"failed"
  }

] }

Testing : PR tested with unit test cases . E2E testing is done on test machine

ShubhamGupta29 commented 4 years ago

The name of the initial commit needs to be corrected, it's "Conflict Resolution" kindly try to change it using git commit amend

And same applies for "Test Commit" message for another commit

pralabhkumar commented 4 years ago

@ShubhamGupta29 Done the changes as requested . Please review

ShubhamGupta29 commented 4 years ago

Seems like the review is done from my side, kindly before pushing the next changes try to remove redundant lines added, modify lines having #chars > 120 and other basic sanitization steps as I didn't leave comments for such nitpicks. Thanks

pralabhkumar commented 4 years ago

@ShubhamGupta29 addressed all your comments (except the Assert one ) , please review