Easy CPU Profiling for Apache Spark applications.
The script spark-submit-flamegraph
is a wrapper around standard spark-submit
that generates Flame Graph.
The script is adapted for work in Amazon EMR. Otherwise the following utilities must present on your system:
PYTHON
environment variable to the Python executabl)PIP
environment variable to the pip utility)wget -O /usr/local/bin/spark-submit-flamegraph \
https://raw.githubusercontent.com/spektom/spark-flamegraph/master/spark-submit-flamegraph
chmod +x /usr/local/bin/spark-submit-flamegraph
Use spark-submit-flamegraph
as a replacement for the spark-submit
command.
To configure use the following environment variables:
Environment Variable | Description | Default value |
---|---|---|
SPARK_CMD |
Spark command to run | spark-submit |
PYTHON |
Path to the Python executable | python2.7 |
PIP |
Path to the pip utility | pip |
For example, to profile Spark shell session set SPARK_CMD
environment variable:
SPARK_CMD=spark-shell /usr/local/bin/spark-submit-flamegraph
The script does the following operations to make profiling Spark applications as easy as possible:
spark-submit
command, with the StatsD profiler Jar in its classpath and with the configuration that tells it to report statistics back to the InfluxDB instance.