How to use this tool with Spark Structured Streaming?

yzhan-te commented 5 years ago

Hello,

Is there a way to setup this tool to collect metrics for Spark Structured Streaming? Currently we have a spark job that pulls Kafka constantly and we have to manually shut it down every time. If I do the manual shutdown, the shell script will complain that the return code is bad and no metrics has been collected.

Thanks!

spektom commented 5 years ago

Hi,

The script should collect metrics regardless of exit code. Please try the following:

Start your Spark streaming application with --conf spark.streaming.stopGracefullyOnShutdown=true
Let the application run, and collect some metrics
Kill the application process by sending a SIGTERM signal to the Spark java process (something like this should work: ps -ef | grep spark | grep java | grep -v grep | awk '{print $2}' | xargs kill -SIGTERM)

See if metrics are collected this way. If the above doesn't help, please attach the output from the script here.

Thanks!

yzhan-te commented 5 years ago

Hi,

Thanks for replying! This it the command I'm using:

/usr/local/bin/spark-submit-flamegraph --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.2,com.thesamet.scalapb:scalapb-runtime_2.11:0.9.0,redis.clients:jedis:3.1.0,com.amazonaws:aws-java-sdk:1.7.4,com.typesafe:config:1.3.4,com.etsy:statsd-jvm-profiler:2.0.0 --conf spark.streaming.stopGracefullyOnShutdown=true --master yarn --deploy-mode cluster  --class com.package.SparkApp spark-scala_2.11-1.0-SNAPSHOT.jar cluster

And I am getting the following when stopping the app:

19/08/09 20:20:06 INFO ShutdownHookManager: Shutdown hook called
19/08/09 20:20:06 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-dca35798-2712-4f86-b376-8463cb5d38fa
19/08/09 20:20:07 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-5e901536-9a80-4d27-98bf-6892d0ad6429
[2019-08-09T20:20:07,136864792+0000] Spark has exited with bad exit code (130)
[2019-08-09T20:20:07,154308000+0000] Collecting profiling metrics
[2019-08-09T20:20:07,524389370+0000] No profiling metrics were recorded!

Also if I kill it with this command:

ps -ef | grep spark | grep java | grep -v grep | grep -v zeppelin | awk '{print $ 2}' | sudo xargs kill -SIGTERM

Then script doesn't even print anything. It becomes just:

19/08/09 20:20:06 INFO ShutdownHookManager: Shutdown hook called
19/08/09 20:20:06 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-dca35798-2712-4f86-b376-8463cb5d38fa
19/08/09 20:20:07 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-5e901536-9a80-4d27-98bf-6892d0ad6429

spektom commented 5 years ago

Can you please try the latest version from master? (f66976e70)

spektom commented 5 years ago

Should be fixed with latest master.

yzhan-te commented 5 years ago

Hi there,

I tried the latest version and it's still not working. When I exit the job, it was in the state of "Collecting profiling metrics" for about 5 seconds and ended with "No profiling metrics were recorded!". Are there any specific commands that I need to put in my code to get it collect metrics? Also I am running this on a EMR cluster with yarn-cluster mode. Do I need to do any changes to the worker machines?

Thanks!

spektom commented 5 years ago

Hi @yzhan-te,

No special configuration is needed, it's added automatically by the script. What can be checked is:

Look into Spark UI of the running process, and inspect Java properties for driver and executor processes. They must contain something like -javaagent:statsd-jvm-profiler.jar=server=<server>,port=<port>,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark. Alternatively, you can validate that by looking at Java command line parameters: ps -ef | grep java.
Check that the server= and port= in the above configuration contain reachable hostname and port. The IP address is detected by the spark-submit-flamegraph script, and the port number is chosen randomly. These host and port are where the script is running, and all Spark components are reporting metrics through it.

Please report back if something is not as described above. Thanks!

yzhan-te commented 5 years ago

Yep the settings are there:

-arg --driver-java-options --arg -javaagent:/home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar=server=<my_ip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark --properties-file

The port is reachable too. Are there any other things you want me to look at specifically?

spektom commented 5 years ago

Can you look at the Java process command line (ps -ef | grep java) and see whether there are these Java properties?

yzhan-te commented 5 years ago

Looks like it's also there:

hadoop    7290  7160 51 21:12 pts/0    00:00:27 /etc/alternatives/jre/bin/java -cp /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf/:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/lib/spark/conf/:/usr/lib/spark/jars/*:/etc/hadoop/conf/ org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode cluster --conf spark.memory.storageFraction=0.1 --conf spark.memory.fraction=0.9 --conf spark.executor.extraJavaOptions=-javaagent:statsd-jvm-profiler.jar=server=<myip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark --conf spark.streaming.stopGracefullyOnShutdown=true --class com.thousandeyes.moneta.SparkApp --jars /home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.2,com.thesamet.scalapb:scalapb-runtime_2.11:0.9.0,redis.clients:jedis:3.1.0,com.typesafe:config:1.3.4 --num-executors 4 --executor-cores 4 --executor-memory 3GB spark-scala_2.11-1.0-SNAPSHOT.jar cluster --driver-java-options -javaagent:/home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar=server=<myip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=sparkhadoop    7290  7160 51 21:12 pts/0    00:00:27 /etc/alternatives/jre/bin/java -cp /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf/:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/lib/spark/conf/:/usr/lib/spark/jars/*:/etc/hadoop/conf/ org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode cluster --conf spark.memory.storageFraction=0.1 --conf spark.memory.fraction=0.9 --conf spark.executor.extraJavaOptions=-javaagent:statsd-jvm-profiler.jar=server=<myip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark --conf spark.streaming.stopGracefullyOnShutdown=true --class com.package.SparkApp --jars /home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.2,com.thesamet.scalapb:scalapb-runtime_2.11:0.9.0,redis.clients:jedis:3.1.0,com.typesafe:config:1.3.4 --num-executors 4 --executor-cores 4 --executor-memory 3GB spark-scala_2.11-1.0-SNAPSHOT.jar cluster --driver-java-options -javaagent:/home/hadoop/.spark-flamegraph/statsd-jvm-profiler.jar=server=<myip>,port=48081,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=sparkapp,tagMapping=spark

spektom commented 5 years ago

This is the Spark client application command line. What's interesting is the driver and executor processes themselves. Could you get them as well? Thanks!

spektom / spark-flamegraph

How to use this tool with Spark Structured Streaming? #5