linkedin / dr-elephant

Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Apache License 2.0
1.35k stars 859 forks source link

Include S3 counters in heuristics #670

Closed astahlman closed 4 years ago

astahlman commented 4 years ago

The Tez heuristics assume that all of the S3-related counters are prefixed with either "S3A" or "S3N". In our case, it's simply "S3_", so our Mapper heuristics all show 0 bytes read per task.

This updates the heuristics to include the "S3_*" counters (the MapReduce heuristics already do this).