Open kartiknooli opened 5 years ago
There is a logs directory before the your dr.elephant folder that I didn't see you list.
$DR_ELEPHANT_DIR/../logs/elephant/dr_elephant.log
Thanks @ColinArmstrong for the response. I did check and here is the log and this time reran another spark job on the cluster and noticed that the elephant UI says it is a Hadoop job and doesn't identify it as a spark job. The dr-elephant.log file does not give me any error messages. Is my understanding not right about how Dr Elephant displays spark jobs on the UI?
When i filter out the jobs on the UI by Job Type Spark, it returns no results.
thanks, Kartik.
Is HTTPS enabled on YARN? If HTTPS is not enabled then use the below steps to get it working
Inject exports of SPARK_HOME and SPARK_CONF_DIR in ./bin/start.sh
file.
Make sure you have Spark Client Installed as a Component is you are using Vendor Specific Distribution.
Update the Spark fetcher configuration to com.linkedin.drelephant.spark.fetchers.SparkFetcher
in the conf file app-conf/FetcherConf.xml
. By default it is commented
This should get Dr. Elephant working against Spark Jobs.
@kartiknooli
To find the dr_elephant.log use $locate dr_elephant.log.
In my case to start getting Spark jobs I had to add the following in app-conf/FetcherConf.xml
`
`
Our Spark event log dir is configured as hdfs:///spark-history
-> we added <event_log_dir>webhdfs:///spark-history</event_log_dir>
And comment out these lines:
`
@shahrukhkhan489 and @lubomir-angelov thanks for the response.
I tried making the suggested changes.
Inject exports of SPARK_HOME and SPARK_CONF_DIR in ./bin/start.sh file. I hope you meant the following:
export SPARK_HOME=/usr/lib/spark
export SPARK_CONF_DIR=/etc/spark/conf
Please correct me if I am wrong.
Make sure you have Spark Client Installed as a Component is you are using Vendor Specific Distribution. We have spark client bootstrapped with EMR
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.1.1
/_/
Using Python version 2.7.12 (default, Sep 1 2016 22:14:00) SparkSession available as 'spark'.
<fetcher>
<applicationtype>spark</applicationtype>
<classname>com.linkedin.drelephant.spark.fetchers.SparkFetcher</classname>
<params>
<use_rest_for_eventlogs>true</use_rest_for_eventlogs>
<should_process_logs_locally>true</should_process_logs_locally>
</params>
</fetcher>
I tried with and without adding the hdfs path for the eventlogs. both of them did not work.
Here is the error message i got from the logs:
11-26-2018 19:24:35 INFO [dr-el-executor-thread-2] com.linkedin.drelephant.ElephantRunner : Analyzing SPARK application_1520505558307_35023
11-26-2018 19:24:35 INFO [ForkJoinPool-1-worker-9] com.linkedin.drelephant.spark.fetchers.SparkRestClient : calling REST API at http://hostname:18080/api/v1/applications/application_1520505558307_35027
11-26-2018 19:24:35 INFO [dr-el-executor-thread-2] com.linkedin.drelephant.spark.fetchers.SparkFetcher : Fetching data for application_1520505558307_35023
11-26-2018 19:24:35 INFO [ForkJoinPool-1-worker-5] com.linkedin.drelephant.spark.fetchers.SparkRestClient : calling REST API at http://hostname:18080/api/v1/applications/application_1520505558307_35023
11-26-2018 19:24:35 ERROR [ForkJoinPool-1-worker-9] com.linkedin.drelephant.spark.fetchers.SparkRestClient : error reading applicationInfo http:hostname:18080/api/v1/applications/application_1520505558307_35027. Exception Message = HTTP 404 Not Found
11-26-2018 19:24:35 WARN [dr-el-executor-thread-1] com.linkedin.drelephant.spark.fetchers.SparkFetcher : Failed fetching data for application_1520505558307_35027. I will retry after some time! Exception Message is: HTTP 404 Not Found
Appreciate your help with this.
It looks like your spark history server is not responding.
I think you need a patched version of SHS to get Spark2 jobs registered. https://github.com/linkedin/dr-elephant/issues/327
On Mon, Nov 26, 2018, 21:41 Kartik notifications@github.com wrote:
@shahrukhkhan489 https://github.com/shahrukhkhan489 and @lubomir-angelov https://github.com/lubomir-angelov thanks for the response.
I tried making the suggested changes.
- Inject exports of SPARK_HOME and SPARK_CONF_DIR in ./bin/start.sh file. I hope you meant the following:
export SPARK_HOME=/usr/lib/spark export SPARK_CONF_DIR=/etc/spark/conf
Please correct me if I am wrong.
- Make sure you have Spark Client Installed as a Component is you are using Vendor Specific Distribution. We have spark client bootstrapped with EMR
Welcome to
/ __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/
/ / ./_,// //_\ version 2.1.1 //
Using Python version 2.7.12 (default, Sep 1 2016 22:14:00) SparkSession available as 'spark'.
- Updated the Spark fetcher configuration to the following:
spark com.linkedin.drelephant.spark.fetchers.SparkFetcher true true I tried with and without adding the hdfs path for the eventlogs. both of them did not work.
Here is the error message i got from the logs:
11-26-2018 19:24:35 INFO [dr-el-executor-thread-2] com.linkedin.drelephant.ElephantRunner : Analyzing SPARK application_1520505558307_35023 11-26-2018 19:24:35 INFO [ForkJoinPool-1-worker-9] com.linkedin.drelephant.spark.fetchers.SparkRestClient : calling REST API at http://hostname:18080/api/v1/applications/application_1520505558307_35027 11-26-2018 http://hostname:18080/api/v1/applications/application_1520505558307_3502711-26-2018 19:24:35 INFO [dr-el-executor-thread-2] com.linkedin.drelephant.spark.fetchers.SparkFetcher : Fetching data for application_1520505558307_35023 11-26-2018 19:24:35 INFO [ForkJoinPool-1-worker-5] com.linkedin.drelephant.spark.fetchers.SparkRestClient : calling REST API at http://hostname:18080/api/v1/applications/application_1520505558307_35023 11-26-2018 http://hostname:18080/api/v1/applications/application_1520505558307_3502311-26-2018 19:24:35 ERROR [ForkJoinPool-1-worker-9] com.linkedin.drelephant.spark.fetchers.SparkRestClient : error reading applicationInfo http:hostname:18080/api/v1/applications/application_1520505558307_35027. Exception Message = HTTP 404 Not Found 11-26-2018 19:24:35 WARN [dr-el-executor-thread-1] com.linkedin.drelephant.spark.fetchers.SparkFetcher : Failed fetching data for application_1520505558307_35027. I will retry after some time! Exception Message is: HTTP 404 Not Found
Appreciate your help with this.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/linkedin/dr-elephant/issues/456#issuecomment-441771127, or mute the thread https://github.com/notifications/unsubscribe-auth/AGaxRL8q8uD9_vKM8v31MusR_IqrjNaZks5uzEPVgaJpZM4X4XQ2 .
@kartiknooli The error 404 indicates that your logs have been rolled out. This might not be the same case with all spark applications
error reading applicationInfo http:hostname:18080/api/v1/applications/application_1520505558307_35027. Exception Message = HTTP 404 Not Found
Try opening the same link using browser. You will see the same log - http:hostname:18080/api/v1/applications/application_1520505558307_35027
doesn't exist dr-elephant webUI sparkjobs I am at dr-elephant version 2.1.7 hadoop3.0.0 spark1.6 at app-conf/FetcherConf.xml
spark org.apache.spark.deploy.history.SparkFSFetcher
hello, I am having a similar issue like a lot of others mentioned but none of those tickets helped me resolve my issue. My spark jobs won't show up on Dr. Elephant UI. I can only see MapReduce jobs. I went through this thread but could not figure out where to find dr elephant logs for the spark jobs? I am on EMR with Hadoop v 2.7.3, Spark 2.1.1. All the configs you mentioned above exist in my cluster. I can see the running spark job on the Resource Manager UI as well as spark history server once it's completed.
spark.yarn.historyServer.address ip-10-XX-XX-X.ec2.internal:18080 spark.eventLog.dir hdfs:///var/log/spark/apps Here is how my dr elephant folder looks like: drwxr-xr-x 2 ec2-user ec2-user 4096 Oct 24 16:29 app-conf drwxr-xr-x 2 ec2-user ec2-user 4096 Oct 17 22:29 bin drwxr-xr-x 3 ec2-user ec2-user 4096 Oct 17 22:29 conf -rwxr-xr-x 1 ec2-user ec2-user 1199 Oct 24 16:30 dr.log drwxr-xr-x 2 ec2-user ec2-user 16384 Oct 17 22:29 lib drwxr-xr-x 2 ec2-user ec2-user 4096 Oct 24 16:31 logs -rwxr-xr-x 1 ec2-user ec2-user 2925 Oct 17 22:26 README.md -rw-r--r-- 1 root root 5 Oct 24 16:30 RUNNING_PID drwxr-xr-x 3 ec2-user ec2-user 4096 Oct 17 22:29 scripts drwxr-xr-x 3 ec2-user ec2-user 4096 Oct 17 22:29 share echo $SPARK_HOME /usr/lib/spark
echo $SPARK_CONF_DIR /usr/lib/spark/conf Am I missing something here? Please help.
thanks, Kartik.