spektom / spark-flamegraph

Easy CPU Profiling for Apache Spark applications
Apache License 2.0
45 stars 12 forks source link

How to use this tool with Spark Structured Streaming? #5

Open yzhan-te opened 5 years ago

yzhan-te commented 5 years ago

Hello,

Is there a way to setup this tool to collect metrics for Spark Structured Streaming? Currently we have a spark job that pulls Kafka constantly and we have to manually shut it down every time. If I do the manual shutdown, the shell script will complain that the return code is bad and no metrics has been collected.

Thanks!

spektom commented 5 years ago

Hi,

The script should collect metrics regardless of exit code. Please try the following:

See if metrics are collected this way. If the above doesn't help, please attach the output from the script here.

Thanks!

yzhan-te commented 5 years ago

Hi,

Thanks for replying! This it the command I'm using:

/usr/local/bin/spark-submit-flamegraph --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.2,com.thesamet.scalapb:scalapb-runtime_2.11:0.9.0,redis.clients:jedis:3.1.0,com.amazonaws:aws-java-sdk:1.7.4,com.typesafe:config:1.3.4,com.etsy:statsd-jvm-profiler:2.0.0 --conf spark.streaming.stopGracefullyOnShutdown=true --master yarn --deploy-mode cluster  --class com.package.SparkApp spark-scala_2.11-1.0-SNAPSHOT.jar cluster

And I am getting the following when stopping the app:

19/08/09 20:20:06 INFO ShutdownHookManager: Shutdown hook called
19/08/09 20:20:06 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-dca35798-2712-4f86-b376-8463cb5d38fa
19/08/09 20:20:07 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-5e901536-9a80-4d27-98bf-6892d0ad6429
[2019-08-09T20:20:07,136864792+0000] Spark has exited with bad exit code (130)
[2019-08-09T20:20:07,154308000+0000] Collecting profiling metrics
[2019-08-09T20:20:07,524389370+0000] No profiling metrics were recorded!

Also if I kill it with this command:

ps -ef | grep spark | grep java | grep -v grep | grep -v zeppelin | awk '{print $ 2}' | sudo xargs kill -SIGTERM

Then script doesn't even print anything. It becomes just:

19/08/09 20:20:06 INFO ShutdownHookManager: Shutdown hook called
19/08/09 20:20:06 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-dca35798-2712-4f86-b376-8463cb5d38fa
19/08/09 20:20:07 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-5e901536-9a80-4d27-98bf-6892d0ad6429
spektom commented 5 years ago

Can you please try the latest version from master? (f66976e70)

spektom commented 5 years ago

Should be fixed with latest master.

yzhan-te commented 5 years ago

Hi there,

I tried the latest version and it's still not working. When I exit the job, it was in the state of "Collecting profiling metrics" for about 5 seconds and ended with "No profiling metrics were recorded!". Are there any specific commands that I need to put in my code to get it collect metrics? Also I am running this on a EMR cluster with yarn-cluster mode. Do I need to do any changes to the worker machines?

Thanks!

spektom commented 5 years ago

Hi @yzhan-te,

No special configuration is needed, it's added automatically by the script. What can be checked is:

Please report back if something is not as described above. Thanks!

yzhan-te commented 5 years ago

Yep the settings are there:

-arg --driver-java-options --arg -javaagent:/home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar=server=<my_ip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark --properties-file 

The port is reachable too. Are there any other things you want me to look at specifically?

spektom commented 5 years ago

Can you look at the Java process command line (ps -ef | grep java) and see whether there are these Java properties?

yzhan-te commented 5 years ago

Looks like it's also there:

hadoop    7290  7160 51 21:12 pts/0    00:00:27 /etc/alternatives/jre/bin/java -cp /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf/:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/lib/spark/conf/:/usr/lib/spark/jars/*:/etc/hadoop/conf/ org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode cluster --conf spark.memory.storageFraction=0.1 --conf spark.memory.fraction=0.9 --conf spark.executor.extraJavaOptions=-javaagent:statsd-jvm-profiler.jar=server=<myip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark --conf spark.streaming.stopGracefullyOnShutdown=true --class com.thousandeyes.moneta.SparkApp --jars /home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.2,com.thesamet.scalapb:scalapb-runtime_2.11:0.9.0,redis.clients:jedis:3.1.0,com.typesafe:config:1.3.4 --num-executors 4 --executor-cores 4 --executor-memory 3GB spark-scala_2.11-1.0-SNAPSHOT.jar cluster --driver-java-options -javaagent:/home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar=server=<myip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=sparkhadoop    7290  7160 51 21:12 pts/0    00:00:27 /etc/alternatives/jre/bin/java -cp /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf/:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/lib/spark/conf/:/usr/lib/spark/jars/*:/etc/hadoop/conf/ org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode cluster --conf spark.memory.storageFraction=0.1 --conf spark.memory.fraction=0.9 --conf spark.executor.extraJavaOptions=-javaagent:statsd-jvm-profiler.jar=server=<myip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark --conf spark.streaming.stopGracefullyOnShutdown=true --class com.package.SparkApp --jars /home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.2,com.thesamet.scalapb:scalapb-runtime_2.11:0.9.0,redis.clients:jedis:3.1.0,com.typesafe:config:1.3.4 --num-executors 4 --executor-cores 4 --executor-memory 3GB spark-scala_2.11-1.0-SNAPSHOT.jar cluster --driver-java-options -javaagent:/home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar=server=<myip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark
spektom commented 5 years ago

This is the Spark client application command line. What's interesting is the driver and executor processes themselves. Could you get them as well? Thanks!