qubole / sparklens

Qubole Sparklens tool for performance tuning Apache Spark
http://sparklens.qubole.com
Apache License 2.0
561 stars 138 forks source link

How does source=history option work? #30

Open Shasidhar opened 5 years ago

Shasidhar commented 5 years ago

I am trying to run sparklens on event logs of my application.

I am using following command

./bin/spark-submit \
    --packages qubole:sparklens:0.2.0-s_2.11 \
    --master local[0] \
    --class com.qubole.sparklens.app.ReporterApp \
    qubole-dummy-arg file:///Users/shasidhar/interests/sparklens/eventlog.txt source=history

I see following output in console

Ivy Default Cache set to: /Users/shasidhar/.ivy2/cache
The jars for the packages stored in: /Users/shasidhar/.ivy2/jars
:: loading settings :: url = jar:file:/Users/shasidhar/interests/spark/spark-2.3.0-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
qubole#sparklens added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
    found qubole#sparklens;0.2.0-s_2.11 in spark-packages
:: resolution report :: resolve 177ms :: artifacts dl 5ms
    :: modules in use:
    qubole#sparklens;0.2.0-s_2.11 from spark-packages in [default]
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   1   |   0   |   0   |   0   ||   1   |   0   |
    ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
    confs: [default]
    0 artifacts copied, 1 already retrieved (0kB/6ms)
2019-01-03 15:46:11 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Warning: Local jar /Users/shasidhar/interests/spark/spark-2.3.0-bin-hadoop2.7/qubole-dummy-arg does not exist, skipping.

2019-01-03 15:46:52 INFO  ShutdownHookManager:54 - Shutdown hook called
2019-01-03 15:46:52 INFO  ShutdownHookManager:54 - Deleting directory /private/var/folders/3t/rfd2djjs1yg30mhmw8z_s7tw0000gp/T/spark-7a992110-6a4f-44f4-9473-1ddade11b53a

What exactly I need to look at after this? Does it generate sparklens json file? If yes, where I can see the output file?

iamrohit commented 5 years ago

Hi @Shasidhar,

I will expect this to print usual sparklens report on the console. We don't really support converting event history file to sparklens json yet (will be adding soon). Here is how we generate sparklens.json from a running application.

--packages qubole:sparklens:0.2.0-s_2.11
--conf spark.extraListeners=com.qubole.sparklens.QuboleJobListener
--conf spark.sparklens.reporting.disabled=true
--conf spark.sparklens.data.dir=/dir/for/saving/sparklens.json
Shasidhar commented 5 years ago

@iamrohit Understood, I think for some reason I don't see the report then

iamrohit commented 5 years ago

@Shasidhar May be something wrong with your event log file? Can you try running with this file [sparklens/src/test/event-history-test-files/local-1532512550423] and check if you still don't get any results?

Shasidhar commented 5 years ago

@iamrohit Yes looks like an issue with my event logs. WIll figure it out thanks. Is there an issue or something which I can follow for the feature which will generate the sparklens.json file from event logs?