linkedin / dr-elephant

Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Apache License 2.0
1.35k stars 859 forks source link

Parsing Azkaban job logs for Exception Fingerprinting #672

Closed ShubhamGupta29 closed 4 years ago

ShubhamGupta29 commented 4 years ago

DESCRIPTION This PR contains changes needed to use and analyze Azkaban logs for a failed job to get the exceptions from Scheduler. This analysis of Azkaban logs is done when none or not much information is available on the Application level. In those cases, Scheduler captures exceptions which can we of value for the users in finding out the Root Cause of job failure.

HOW THESE CHANGES ARE TESTED Respective unit tests are added for the Analyzer class. Also, tested the changes on the EI machine.

pralabhkumar commented 4 years ago

LGTM