Open AbdelrahmanMosly opened 1 year ago
Hi AbdelrahmanMosly,
I am facing an issue while trying to fetch the spark-event logs from the local file system.
It is giving me the below error :
[[37minfo[0m] play - Listening for HTTP on /0:0:0:0:0:0:0:0:9001
Event Log Location URI: /Users/shaikbasha/spark-events
java.io.FileNotFoundException: File /Users/shaikbasha/spark-events does not exist.
at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.
When I run the drelephant with hdfs it is working fine and getting the event logs data into the ui. But i want to fetch the spark-events from the local file system(or any directory not necessarily hdfs) ex- gs:// is it possible? Can you please help me in this it is a bit urgent please see through this as soon as possible.
Hi AbdelrahmanMosly, I am facing an issue while trying to fetch the spark-event logs from the local file system. It is giving me the below error : [[37minfo[0m] play - Listening for HTTP on /0:0:0:0:0:0:0:0:9001 Event Log Location URI: /Users/shaikbasha/spark-events java.io.FileNotFoundException: File /Users/shaikbasha/spark-events does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.
(DistributedFileSystem.java:1144) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator. (DistributedFileSystem.java:1122) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1067) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1063) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1063) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2002) at org.apache.hadoop.fs.FileSystem$5. (FileSystem.java:2129) at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:2127) at com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2.fetchAnalyticsJobsFromEventLogs(AnalyticJobGeneratorHadoop2.java:261) at com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2.fetchAnalyticJobs(AnalyticJobGeneratorHadoop2.java:291) at com.linkedin.drelephant.ElephantRunner$1.run(ElephantRunner.java:190) at com.linkedin.drelephant.ElephantRunner$1.run(ElephantRunner.java:153) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1918) at com.linkedin.drelephant.security.HadoopSecurity.doAs(HadoopSecurity.java:109) at com.linkedin.drelephant.ElephantRunner.run(ElephantRunner.java:153) at com.linkedin.drelephant.DrElephant.run(DrElephant.java:67) at java.lang.Thread.run(Thread.java:750) When I run the drelephant with hdfs it is working fine and getting the event logs data into the ui. But i want to fetch the spark-events from the local file system(or any directory not necessarily hdfs) ex- gs:// is it possible? Can you please help me in this it is a bit urgent please see through this as soon as possible.
@Javid-Shaik
File Existence: Verify that the directory /Users/shaikbasha/spark-events
exists and contains the Spark event logs.
Fetcher Configuration: Update the fetcher configuration in Dr. Elephant to correctly point to the local file system path. This involves editing the configuration files. For example, in app-conf/FetcherConf.xml
, ensure the event_log_location_uri
is set to the correct local path:
Permissions: Ensure that the user running Dr. Elephant has the necessary permissions to read the files in the specified directory.
Configuration Files: Make sure all other necessary configurations are correctly set up as per the Dr. Elephant setup instructions
Hi AbdelrahmanMosly, I am facing an issue while trying to fetch the spark-event logs from the local file system. It is giving me the below error : [�[37minfo�[0m] play - Listening for HTTP on /0:0:0:0:0:0:0:0:9001 Event Log Location URI: /Users/shaikbasha/spark-events java.io.FileNotFoundException: File /Users/shaikbasha/spark-events does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1144) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1122) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1067) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1063) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1063) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2002) at org.apache.hadoop.fs.FileSystem$5.(FileSystem.java:2129) at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:2127) at com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2.fetchAnalyticsJobsFromEventLogs(AnalyticJobGeneratorHadoop2.java:261) at com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2.fetchAnalyticJobs(AnalyticJobGeneratorHadoop2.java:291) at com.linkedin.drelephant.ElephantRunner$1.run(ElephantRunner.java:190) at com.linkedin.drelephant.ElephantRunner$1.run(ElephantRunner.java:153) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1918) at com.linkedin.drelephant.security.HadoopSecurity.doAs(HadoopSecurity.java:109) at com.linkedin.drelephant.ElephantRunner.run(ElephantRunner.java:153) at com.linkedin.drelephant.DrElephant.run(DrElephant.java:67) at java.lang.Thread.run(Thread.java:750) When I run the drelephant with hdfs it is working fine and getting the event logs data into the ui. But i want to fetch the spark-events from the local file system(or any directory not necessarily hdfs) ex- gs:// is it possible? Can you please help me in this it is a bit urgent please see through this as soon as possible.
@Javid-Shaik
- File Existence: Verify that the directory
/Users/shaikbasha/spark-events
exists and contains the Spark event logs.- Fetcher Configuration: Update the fetcher configuration in Dr. Elephant to correctly point to the local file system path. This involves editing the configuration files. For example, in
app-conf/FetcherConf.xml
, ensure theevent_log_location_uri
is set to the correct local path:- Permissions: Ensure that the user running Dr. Elephant has the necessary permissions to read the files in the specified directory.
- Configuration Files: Make sure all other necessary configurations are correctly set up as per the Dr. Elephant setup instructions
@AbdelrahmanMosly
Directory is exists and spark-event logs are present in the directory. shaikbasha@C02G144RMD6M dr-elephant-2.1.7 % ls /Users/shaikbasha/spark-events | tail -n 10 spark-034778d6e9844b97b5fc4217197e0d91 spark-19ce088f2e7b4443a09b32ee1082e546 spark-46a7d8db9504453a816a6d1a98884709 spark-4a5a8432e5c7452e8638de54c8db1297 spark-6befa09607c249e2aa0fc5d2e650f814 spark-a823dfda6b6d4d7481a2f3065de0201e spark-dcf713ca380a41ffbfc578e379c50f59
FetcherConf.xml
Permissions are provided shaikbasha@C02G144RMD6M dr-elephant-2.1.7 % ls -l /Users/shaikbasha/spark-events | tail -n 5 -rw-r--r-- 1 shaikbasha staff 51674335 Jun 21 10:34 spark-46a7d8db9504453a816a6d1a98884709 -rw-r--r-- 1 shaikbasha staff 87653 Jun 20 18:02 spark-4a5a8432e5c7452e8638de54c8db1297
And i have configured the spark and hadoop configurations files correctly.
Please help me on this. As I have already mentioned that dr-elephant is working fine with hdfs. I want it to work with the local FS.
The below error is in the dr_elephant.log file
06-24-2024 20:28:10 INFO [Thread-8] com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : Event log directory file:///Users/shaikbasha/spark-events
06-24-2024 20:28:10 ERROR [Thread-8] com.linkedin.drelephant.ElephantRunner : Error fetching job list. Try again later...
java.lang.IllegalArgumentException: Wrong FS: file:/Users/shaikbasha/spark-events
, expected: hdfs://localhost:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:779)
Hi AbdelrahmanMosly, I am facing an issue while trying to fetch the spark-event logs from the local file system. It is giving me the below error : [�[37minfo�[0m] play - Listening for HTTP on /0:0:0:0:0:0:0:0:9001 Event Log Location URI: /Users/shaikbasha/spark-events java.io.FileNotFoundException: File /Users/shaikbasha/spark-events does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1144) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1122) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1067) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1063) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1063) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2002) at org.apache.hadoop.fs.FileSystem$5.(FileSystem.java:2129) at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:2127) at com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2.fetchAnalyticsJobsFromEventLogs(AnalyticJobGeneratorHadoop2.java:261) at com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2.fetchAnalyticJobs(AnalyticJobGeneratorHadoop2.java:291) at com.linkedin.drelephant.ElephantRunner$1.run(ElephantRunner.java:190) at com.linkedin.drelephant.ElephantRunner$1.run(ElephantRunner.java:153) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1918) at com.linkedin.drelephant.security.HadoopSecurity.doAs(HadoopSecurity.java:109) at com.linkedin.drelephant.ElephantRunner.run(ElephantRunner.java:153) at com.linkedin.drelephant.DrElephant.run(DrElephant.java:67) at java.lang.Thread.run(Thread.java:750) When I run the drelephant with hdfs it is working fine and getting the event logs data into the ui. But i want to fetch the spark-events from the local file system(or any directory not necessarily hdfs) ex- gs:// is it possible? Can you please help me in this it is a bit urgent please see through this as soon as possible.
@Javid-Shaik
- File Existence: Verify that the directory
/Users/shaikbasha/spark-events
exists and contains the Spark event logs.- Fetcher Configuration: Update the fetcher configuration in Dr. Elephant to correctly point to the local file system path. This involves editing the configuration files. For example, in
app-conf/FetcherConf.xml
, ensure theevent_log_location_uri
is set to the correct local path:- Permissions: Ensure that the user running Dr. Elephant has the necessary permissions to read the files in the specified directory.
- Configuration Files: Make sure all other necessary configurations are correctly set up as per the Dr. Elephant setup instructions
@AbdelrahmanMosly
Directory is exists and spark-event logs are present in the directory. shaikbasha@C02G144RMD6M dr-elephant-2.1.7 % ls /Users/shaikbasha/spark-events | tail -n 10 spark-034778d6e9844b97b5fc4217197e0d91 spark-19ce088f2e7b4443a09b32ee1082e546 spark-46a7d8db9504453a816a6d1a98884709 spark-4a5a8432e5c7452e8638de54c8db1297 spark-6befa09607c249e2aa0fc5d2e650f814 spark-a823dfda6b6d4d7481a2f3065de0201e spark-dcf713ca380a41ffbfc578e379c50f59
FetcherConf.xml
spark com.linkedin.drelephant.spark.fetchers.FSFetcher /Users/shaikbasha/spark-events Permissions are provided shaikbasha@C02G144RMD6M dr-elephant-2.1.7 % ls -l /Users/shaikbasha/spark-events | tail -n 5 -rw-r--r-- 1 shaikbasha staff 51674335 Jun 21 10:34 spark-46a7d8db9504453a816a6d1a98884709 -rw-r--r-- 1 shaikbasha staff 87653 Jun 20 18:02 spark-4a5a8432e5c7452e8638de54c8db1297
And i have configured the spark and hadoop configurations files correctly.
Please help me on this. As I have already mentioned that dr-elephant is working fine with hdfs. I want it to work with the local FS.
The below error is in the dr_elephant.log file
06-24-2024 20:28:10 INFO [Thread-8] com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : Event log directory file:///Users/shaikbasha/spark-events 06-24-2024 20:28:10 ERROR [Thread-8] com.linkedin.drelephant.ElephantRunner : Error fetching job list. Try again later... java.lang.IllegalArgumentException:
Wrong FS: file:/Users/shaikbasha/spark-events
,expected: hdfs://localhost:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:779)
@Javid-Shaik
First i recommend you to chrck this commit : https://github.com/linkedin/dr-elephant/pull/716/commits/eb6092bff701e4c4afea34eb8e22c23983934781
Based on the error message Wrong FS: file:/Users/shaikbasha/spark-events, expected: hdfs://localhost:8020
, it seems that Dr. Elephant is configured to expect HDFS by default, but you're trying to fetch logs from the local file system. This discrepancy causes the error.
Check core-site.xml
Configuration:
core-site.xml
) is set up to handle local file system paths. You might need to specify fs.defaultFS
as file:///
for the local file system. Here’s an example configuration for core-site.xml
:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>file:///</value>
</property>
</configuration>
Check for Hardcoded HDFS References:
Ensure Correct FileSystem Class:
LocalFileSystem
should be used instead of HDFS. You may need to set this explicitly in your configuration.Restart Dr. Elephant:
Hi AbdelrahmanMosly, I am facing an issue while trying to fetch the spark-event logs from the local file system. It is giving me the below error : [�[37minfo�[0m] play - Listening for HTTP on /0:0:0:0:0:0:0:0:9001 Event Log Location URI: /Users/shaikbasha/spark-events java.io.FileNotFoundException: File /Users/shaikbasha/spark-events does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1144) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1122) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1067) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1063) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1063) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2002) at org.apache.hadoop.fs.FileSystem$5.(FileSystem.java:2129) at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:2127) at com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2.fetchAnalyticsJobsFromEventLogs(AnalyticJobGeneratorHadoop2.java:261) at com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2.fetchAnalyticJobs(AnalyticJobGeneratorHadoop2.java:291) at com.linkedin.drelephant.ElephantRunner$1.run(ElephantRunner.java:190) at com.linkedin.drelephant.ElephantRunner$1.run(ElephantRunner.java:153) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1918) at com.linkedin.drelephant.security.HadoopSecurity.doAs(HadoopSecurity.java:109) at com.linkedin.drelephant.ElephantRunner.run(ElephantRunner.java:153) at com.linkedin.drelephant.DrElephant.run(DrElephant.java:67) at java.lang.Thread.run(Thread.java:750) When I run the drelephant with hdfs it is working fine and getting the event logs data into the ui. But i want to fetch the spark-events from the local file system(or any directory not necessarily hdfs) ex- gs:// is it possible? Can you please help me in this it is a bit urgent please see through this as soon as possible.
@Javid-Shaik
- File Existence: Verify that the directory
/Users/shaikbasha/spark-events
exists and contains the Spark event logs.- Fetcher Configuration: Update the fetcher configuration in Dr. Elephant to correctly point to the local file system path. This involves editing the configuration files. For example, in
app-conf/FetcherConf.xml
, ensure theevent_log_location_uri
is set to the correct local path:- Permissions: Ensure that the user running Dr. Elephant has the necessary permissions to read the files in the specified directory.
- Configuration Files: Make sure all other necessary configurations are correctly set up as per the Dr. Elephant setup instructions
@AbdelrahmanMosly Directory is exists and spark-event logs are present in the directory. shaikbasha@C02G144RMD6M dr-elephant-2.1.7 % ls /Users/shaikbasha/spark-events | tail -n 10 spark-034778d6e9844b97b5fc4217197e0d91 spark-19ce088f2e7b4443a09b32ee1082e546 spark-46a7d8db9504453a816a6d1a98884709 spark-4a5a8432e5c7452e8638de54c8db1297 spark-6befa09607c249e2aa0fc5d2e650f814 spark-a823dfda6b6d4d7481a2f3065de0201e spark-dcf713ca380a41ffbfc578e379c50f59 FetcherConf.xml spark com.linkedin.drelephant.spark.fetchers.FSFetcher
/Users/shaikbasha/spark-events
Permissions are provided shaikbasha@C02G144RMD6M dr-elephant-2.1.7 % ls -l /Users/shaikbasha/spark-events | tail -n 5 -rw-r--r-- 1 shaikbasha staff 51674335 Jun 21 10:34 spark-46a7d8db9504453a816a6d1a98884709 -rw-r--r-- 1 shaikbasha staff 87653 Jun 20 18:02 spark-4a5a8432e5c7452e8638de54c8db1297 And i have configured the spark and hadoop configurations files correctly. Please help me on this. As I have already mentioned that dr-elephant is working fine with hdfs. I want it to work with the local FS. The below error is in the dr_elephant.log file 06-24-2024 20:28:10 INFO [Thread-8] com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : Event log directory file:///Users/shaikbasha/spark-events 06-24-2024 20:28:10 ERROR [Thread-8] com.linkedin.drelephant.ElephantRunner : Error fetching job list. Try again later... java.lang.IllegalArgumentException:Wrong FS: file:/Users/shaikbasha/spark-events
,expected: hdfs://localhost:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:779)@Javid-Shaik
First i recommend you to chrck this commit : eb6092b
Based on the error message
Wrong FS: file:/Users/shaikbasha/spark-events, expected: hdfs://localhost:8020
, it seems that Dr. Elephant is configured to expect HDFS by default, but you're trying to fetch logs from the local file system. This discrepancy causes the error.
Check
core-site.xml
Configuration:
- Ensure that the Hadoop configuration (
core-site.xml
) is set up to handle local file system paths.- You might need to specify
fs.defaultFS
asfile:///
for the local file system. Here’s an example configuration forcore-site.xml
:<configuration> <property> <name>fs.defaultFS</name> <value>file:///</value> </property> </configuration>
Check for Hardcoded HDFS References:
- Review the Dr. Elephant source code or configurations for any hardcoded references to HDFS. Ensure that these are flexible enough to support the local file system.
Ensure Correct FileSystem Class:
- Ensure the correct FileSystem implementation is being used. For local file system,
LocalFileSystem
should be used instead of HDFS. You may need to set this explicitly in your configuration.Restart Dr. Elephant:
- After making these changes, restart Dr. Elephant to apply the new configurations.
Thank you @AbdelrahmanMosly
After changing the default.Fs to file:/// in the core-site.xml I was able to get the data into the dr-elephant ui.
And I observed that we should just need to start the spark history server
no need to start the mr-jobhistory-server
.
But then I am getting this error in the dr.log
:
Event Log Location URI: /Users/shaikbasha/spark-events
[[31merror[0m] o.a.s.s.ReplayListenerBus - Exception parsing Spark event log: file:/Users/shaikbasha/spark-events/spark-034778d6e9844b97b5fc4217197e0d91
org.json4s.package$MappingException: Did not find value which can be converted into boolean
at org.json4s.reflect.package$.fail(package.scala:96) ~[org.json4s.json4s-core_2.10-3.2.10.jar:3.2.10]
[[31merror[0m] o.a.s.s.ReplayListenerBus - Malformed line #9: {"Event":"SparkListenerJobStart" ...... }
Can you please help in resolving this error.
Hi AbdelrahmanMosly, I am facing an issue while trying to fetch the spark-event logs from the local file system. It is giving me the below error : [�[37minfo�[0m] play - Listening for HTTP on /0:0:0:0:0:0:0:0:9001 Event Log Location URI: /Users/shaikbasha/spark-events java.io.FileNotFoundException: File /Users/shaikbasha/spark-events does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1144) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1122) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1067) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1063) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1063) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2002) at org.apache.hadoop.fs.FileSystem$5.(FileSystem.java:2129) at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:2127) at com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2.fetchAnalyticsJobsFromEventLogs(AnalyticJobGeneratorHadoop2.java:261) at com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2.fetchAnalyticJobs(AnalyticJobGeneratorHadoop2.java:291) at com.linkedin.drelephant.ElephantRunner$1.run(ElephantRunner.java:190) at com.linkedin.drelephant.ElephantRunner$1.run(ElephantRunner.java:153) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1918) at com.linkedin.drelephant.security.HadoopSecurity.doAs(HadoopSecurity.java:109) at com.linkedin.drelephant.ElephantRunner.run(ElephantRunner.java:153) at com.linkedin.drelephant.DrElephant.run(DrElephant.java:67) at java.lang.Thread.run(Thread.java:750) When I run the drelephant with hdfs it is working fine and getting the event logs data into the ui. But i want to fetch the spark-events from the local file system(or any directory not necessarily hdfs) ex- gs:// is it possible? Can you please help me in this it is a bit urgent please see through this as soon as possible.
@Javid-Shaik
- File Existence: Verify that the directory
/Users/shaikbasha/spark-events
exists and contains the Spark event logs.- Fetcher Configuration: Update the fetcher configuration in Dr. Elephant to correctly point to the local file system path. This involves editing the configuration files. For example, in
app-conf/FetcherConf.xml
, ensure theevent_log_location_uri
is set to the correct local path:- Permissions: Ensure that the user running Dr. Elephant has the necessary permissions to read the files in the specified directory.
- Configuration Files: Make sure all other necessary configurations are correctly set up as per the Dr. Elephant setup instructions
@AbdelrahmanMosly Directory is exists and spark-event logs are present in the directory. shaikbasha@C02G144RMD6M dr-elephant-2.1.7 % ls /Users/shaikbasha/spark-events | tail -n 10 spark-034778d6e9844b97b5fc4217197e0d91 spark-19ce088f2e7b4443a09b32ee1082e546 spark-46a7d8db9504453a816a6d1a98884709 spark-4a5a8432e5c7452e8638de54c8db1297 spark-6befa09607c249e2aa0fc5d2e650f814 spark-a823dfda6b6d4d7481a2f3065de0201e spark-dcf713ca380a41ffbfc578e379c50f59 FetcherConf.xml spark com.linkedin.drelephant.spark.fetchers.FSFetcher
/Users/shaikbasha/spark-events Permissions are provided shaikbasha@C02G144RMD6M dr-elephant-2.1.7 % ls -l /Users/shaikbasha/spark-events | tail -n 5 -rw-r--r-- 1 shaikbasha staff 51674335 Jun 21 10:34 spark-46a7d8db9504453a816a6d1a98884709 -rw-r--r-- 1 shaikbasha staff 87653 Jun 20 18:02 spark-4a5a8432e5c7452e8638de54c8db1297 And i have configured the spark and hadoop configurations files correctly. Please help me on this. As I have already mentioned that dr-elephant is working fine with hdfs. I want it to work with the local FS. The below error is in the dr_elephant.log file 06-24-2024 20:28:10 INFO [Thread-8] com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : Event log directory file:///Users/shaikbasha/spark-events 06-24-2024 20:28:10 ERROR [Thread-8] com.linkedin.drelephant.ElephantRunner : Error fetching job list. Try again later... java.lang.IllegalArgumentException:Wrong FS: file:/Users/shaikbasha/spark-events
,expected: hdfs://localhost:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:779)@Javid-Shaik First i recommend you to chrck this commit : eb6092b Based on the error message
Wrong FS: file:/Users/shaikbasha/spark-events, expected: hdfs://localhost:8020
, it seems that Dr. Elephant is configured to expect HDFS by default, but you're trying to fetch logs from the local file system. This discrepancy causes the error.
Check
core-site.xml
Configuration:
- Ensure that the Hadoop configuration (
core-site.xml
) is set up to handle local file system paths.- You might need to specify
fs.defaultFS
asfile:///
for the local file system. Here’s an example configuration forcore-site.xml
:<configuration> <property> <name>fs.defaultFS</name> <value>file:///</value> </property> </configuration>
Check for Hardcoded HDFS References:
- Review the Dr. Elephant source code or configurations for any hardcoded references to HDFS. Ensure that these are flexible enough to support the local file system.
Ensure Correct FileSystem Class:
- Ensure the correct FileSystem implementation is being used. For local file system,
LocalFileSystem
should be used instead of HDFS. You may need to set this explicitly in your configuration.Restart Dr. Elephant:
- After making these changes, restart Dr. Elephant to apply the new configurations.
Thank you @AbdelrahmanMosly After changing the default.Fs to file:/// in the core-site.xml I was able to get the data into the dr-elephant ui. And I observed that we should just need to start the
spark history server
no need to start themr-jobhistory-server
.But then I am getting this error in the
dr.log
: Event Log Location URI: /Users/shaikbasha/spark-events [�[31merror�[0m] o.a.s.s.ReplayListenerBus - Exception parsing Spark event log: file:/Users/shaikbasha/spark-events/spark-034778d6e9844b97b5fc4217197e0d91 org.json4s.package$MappingException: Did not find value which can be converted into boolean at org.json4s.reflect.package$.fail(package.scala:96) ~[org.json4s.json4s-core_2.10-3.2.10.jar:3.2.10] [�[31merror�[0m] o.a.s.s.ReplayListenerBus - Malformed line #9: {"Event":"SparkListenerJobStart" ...... }Can you please help in resolving this error.
@Javid-Shaik I believe the issue is related to differences in Spark versions. Spark 1.x, 2.x, and 3.x have variations in the event listeners they use. Dr. Elephant was originally designed for Spark 1.x and was later adapted for Spark 2.x in some pull requests.
If you check my PR, you'll see that to make Dr. Elephant compatible with Spark 3.x, I had to modify the listeners. Spark 3.x introduced new listeners and removed some of the existing ones, which required adjustments in the event log parsing logic.
Additionally you can check those commits https://github.com/linkedin/dr-elephant/pull/716/commits/71e6f2c4da0b8ea7f29521af75eb886eab54f508
https://github.com/linkedin/dr-elephant/pull/716/commits/a1e6c67c72e7ace5ed9ddbcd33543ccedeb71250
@AbdelrahmanMosly I have done everything as you have told me to do but then I am getting this new error [[31merror[0m] o.a.s.s.ReplayListenerBus - Exception parsing Spark event log: file:/Users/shaikbasha/spark-events/spark-46a7d8db9504453a816a6d1a98884709 scala.MatchError: org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart (of class java.lang.String) at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:466) ~[org.apache.spark.spark-core_2.10-1.4.0.jar:1.4.0] at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58) ~[org.apache.spark.spark-core_2.10-1.4.0.jar:1.4.0] at org.apache.spark.deploy.history.SparkDataCollection.load(SparkDataCollection.scala:310) [com.linkedin.drelephant.dr-elephant-2.1.7.jar:2.1.7] at org.apache.spark.deploy.history.SparkFSFetcher$$anonfun$doFetchData$1.apply(SparkFSFetcher.scala:105) [com.linkedin.drelephant.dr-elephant-2.1.7.jar:2.1.7] at org.apache.spark.deploy.history.SparkFSFetcher$$anonfun$doFetchData$1.apply(SparkFSFetcher.scala:104) [com.linkedin.drelephant.dr-elephant-2.1.7.jar:2.1.7] at scala.Function1$$anonfun$andThen$1.apply(Function1.scala:55) [org.scala-lang.scala-library-2.10.4.jar:na] [[31merror[0m] o.a.s.s.ReplayListenerBus - Malformed line #5: {"Event":"org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart"
Hi AbdelrahmanMosly, I am facing an issue while trying to fetch the spark-event logs from the local file system. It is giving me the below error : [�[37minfo�[0m] play - Listening for HTTP on /0:0:0:0:0:0:0:0:9001 Event Log Location URI: /Users/shaikbasha/spark-events java.io.FileNotFoundException: File /Users/shaikbasha/spark-events does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1144) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1122) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1067) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1063) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1063) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2002) at org.apache.hadoop.fs.FileSystem$5.(FileSystem.java:2129) at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:2127) at com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2.fetchAnalyticsJobsFromEventLogs(AnalyticJobGeneratorHadoop2.java:261) at com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2.fetchAnalyticJobs(AnalyticJobGeneratorHadoop2.java:291) at com.linkedin.drelephant.ElephantRunner$1.run(ElephantRunner.java:190) at com.linkedin.drelephant.ElephantRunner$1.run(ElephantRunner.java:153) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1918) at com.linkedin.drelephant.security.HadoopSecurity.doAs(HadoopSecurity.java:109) at com.linkedin.drelephant.ElephantRunner.run(ElephantRunner.java:153) at com.linkedin.drelephant.DrElephant.run(DrElephant.java:67) at java.lang.Thread.run(Thread.java:750) When I run the drelephant with hdfs it is working fine and getting the event logs data into the ui. But i want to fetch the spark-events from the local file system(or any directory not necessarily hdfs) ex- gs:// is it possible? Can you please help me in this it is a bit urgent please see through this as soon as possible.
@Javid-Shaik
- File Existence: Verify that the directory
/Users/shaikbasha/spark-events
exists and contains the Spark event logs.- Fetcher Configuration: Update the fetcher configuration in Dr. Elephant to correctly point to the local file system path. This involves editing the configuration files. For example, in
app-conf/FetcherConf.xml
, ensure theevent_log_location_uri
is set to the correct local path:- Permissions: Ensure that the user running Dr. Elephant has the necessary permissions to read the files in the specified directory.
- Configuration Files: Make sure all other necessary configurations are correctly set up as per the Dr. Elephant setup instructions
@AbdelrahmanMosly Directory is exists and spark-event logs are present in the directory. shaikbasha@C02G144RMD6M dr-elephant-2.1.7 % ls /Users/shaikbasha/spark-events | tail -n 10 spark-034778d6e9844b97b5fc4217197e0d91 spark-19ce088f2e7b4443a09b32ee1082e546 spark-46a7d8db9504453a816a6d1a98884709 spark-4a5a8432e5c7452e8638de54c8db1297 spark-6befa09607c249e2aa0fc5d2e650f814 spark-a823dfda6b6d4d7481a2f3065de0201e spark-dcf713ca380a41ffbfc578e379c50f59 FetcherConf.xml spark com.linkedin.drelephant.spark.fetchers.FSFetcher
/Users/shaikbasha/spark-events Permissions are provided shaikbasha@C02G144RMD6M dr-elephant-2.1.7 % ls -l /Users/shaikbasha/spark-events | tail -n 5 -rw-r--r-- 1 shaikbasha staff 51674335 Jun 21 10:34 spark-46a7d8db9504453a816a6d1a98884709 -rw-r--r-- 1 shaikbasha staff 87653 Jun 20 18:02 spark-4a5a8432e5c7452e8638de54c8db1297 And i have configured the spark and hadoop configurations files correctly. Please help me on this. As I have already mentioned that dr-elephant is working fine with hdfs. I want it to work with the local FS. The below error is in the dr_elephant.log file 06-24-2024 20:28:10 INFO [Thread-8] com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : Event log directory file:///Users/shaikbasha/spark-events 06-24-2024 20:28:10 ERROR [Thread-8] com.linkedin.drelephant.ElephantRunner : Error fetching job list. Try again later... java.lang.IllegalArgumentException:Wrong FS: file:/Users/shaikbasha/spark-events
,expected: hdfs://localhost:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:779)@Javid-Shaik First i recommend you to chrck this commit : eb6092b Based on the error message
Wrong FS: file:/Users/shaikbasha/spark-events, expected: hdfs://localhost:8020
, it seems that Dr. Elephant is configured to expect HDFS by default, but you're trying to fetch logs from the local file system. This discrepancy causes the error.
Check
core-site.xml
Configuration:
- Ensure that the Hadoop configuration (
core-site.xml
) is set up to handle local file system paths.- You might need to specify
fs.defaultFS
asfile:///
for the local file system. Here’s an example configuration forcore-site.xml
:<configuration> <property> <name>fs.defaultFS</name> <value>file:///</value> </property> </configuration>
Check for Hardcoded HDFS References:
- Review the Dr. Elephant source code or configurations for any hardcoded references to HDFS. Ensure that these are flexible enough to support the local file system.
Ensure Correct FileSystem Class:
- Ensure the correct FileSystem implementation is being used. For local file system,
LocalFileSystem
should be used instead of HDFS. You may need to set this explicitly in your configuration.Restart Dr. Elephant:
- After making these changes, restart Dr. Elephant to apply the new configurations.
Thank you @AbdelrahmanMosly After changing the default.Fs to file:/// in the core-site.xml I was able to get the data into the dr-elephant ui. And I observed that we should just need to start the
spark history server
no need to start themr-jobhistory-server
. But then I am getting this error in thedr.log
: Event Log Location URI: /Users/shaikbasha/spark-events [�[31merror�[0m] o.a.s.s.ReplayListenerBus - Exception parsing Spark event log: file:/Users/shaikbasha/spark-events/spark-034778d6e9844b97b5fc4217197e0d91 org.json4s.package$MappingException: Did not find value which can be converted into boolean at org.json4s.reflect.package$.fail(package.scala:96) ~[org.json4s.json4s-core_2.10-3.2.10.jar:3.2.10] [�[31merror�[0m] o.a.s.s.ReplayListenerBus - Malformed line #9: {"Event":"SparkListenerJobStart"
...... } Can you please help in resolving this error.@Javid-Shaik I believe the issue is related to differences in Spark versions. Spark 1.x, 2.x, and 3.x have variations in the event listeners they use. Dr. Elephant was originally designed for Spark 1.x and was later adapted for Spark 2.x in some pull requests.
If you check my PR, you'll see that to make Dr. Elephant compatible with Spark 3.x, I had to modify the listeners. Spark 3.x introduced new listeners and removed some of the existing ones, which required adjustments in the event log parsing logic.
Additionally you can check those commits 71e6f2c
@AbdelrahmanMosly
I have done everything as you have told me to do but then I am getting this new error along with the previous error.
[[31merror[0m] o.a.s.s.ReplayListenerBus - Exception parsing Spark event log: file:/Users/shaikbasha/spark-events/spark-46a7d8db9504453a816a6d1a98884709
scala.MatchError: org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart
(of class java.lang.String)
at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:466) ~[org.apache.spark.spark-core_2.10-1.4.0.jar:1.4.0]
at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58) ~[org.apache.spark.spark-core_2.10-1.4.0.jar:1.4.0]
at org.apache.spark.deploy.history.SparkDataCollection.load(SparkDataCollection.scala:310) [com.linkedin.drelephant.dr-elephant-2.1.7.jar:2.1.7]
at org.apache.spark.deploy.history.SparkFSFetcher$$anonfun$doFetchData$1.apply(SparkFSFetcher.scala:105) [com.linkedin.drelephant.dr-elephant-2.1.7.jar:2.1.7]
at org.apache.spark.deploy.history.SparkFSFetcher$$anonfun$doFetchData$1.apply(SparkFSFetcher.scala:104) [com.linkedin.drelephant.dr-elephant-2.1.7.jar:2.1.7]
at scala.Function1$$anonfun$andThen$1.apply(Function1.scala:55) [org.scala-lang.scala-library-2.10.4.jar:na]
[[31merror[0m] o.a.s.s.ReplayListenerBus - Malformed line #5: {"Event":"org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart"
... }
And the ui is like in the above pictures giving the wrong data. Please help me in this.
@AbdelrahmanMosly I have done everything as you have told me to do but then I am getting this new error [�[31merror�[0m] o.a.s.s.ReplayListenerBus - Exception parsing Spark event log: file:/Users/shaikbasha/spark-events/spark-46a7d8db9504453a816a6d1a98884709 scala.MatchError: org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart (of class java.lang.String) at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:466) ~[org.apache.spark.spark-core_2.10-1.4.0.jar:1.4.0] at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58) ~[org.apache.spark.spark-core_2.10-1.4.0.jar:1.4.0] at org.apache.spark.deploy.history.SparkDataCollection.load(SparkDataCollection.scala:310) [com.linkedin.drelephant.dr-elephant-2.1.7.jar:2.1.7] at org.apache.spark.deploy.history.SparkFSFetcher$$anonfun$doFetchData$1.apply(SparkFSFetcher.scala:105) [com.linkedin.drelephant.dr-elephant-2.1.7.jar:2.1.7] at org.apache.spark.deploy.history.SparkFSFetcher$$anonfun$doFetchData$1.apply(SparkFSFetcher.scala:104) [com.linkedin.drelephant.dr-elephant-2.1.7.jar:2.1.7] at scala.Function1$$anonfun$andThen$1.apply(Function1.scala:55) [org.scala-lang.scala-library-2.10.4.jar:na] [�[31merror�[0m] o.a.s.s.ReplayListenerBus - Malformed line #5: {"Event":"org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart"
Hi AbdelrahmanMosly, I am facing an issue while trying to fetch the spark-event logs from the local file system. It is giving me the below error : [�[37minfo�[0m] play - Listening for HTTP on /0:0:0:0:0:0:0:0:9001 Event Log Location URI: /Users/shaikbasha/spark-events java.io.FileNotFoundException: File /Users/shaikbasha/spark-events does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1144) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.(DistributedFileSystem.java:1122) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1067) at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1063) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1063) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2002) at org.apache.hadoop.fs.FileSystem$5.(FileSystem.java:2129) at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:2127) at com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2.fetchAnalyticsJobsFromEventLogs(AnalyticJobGeneratorHadoop2.java:261) at com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2.fetchAnalyticJobs(AnalyticJobGeneratorHadoop2.java:291) at com.linkedin.drelephant.ElephantRunner$1.run(ElephantRunner.java:190) at com.linkedin.drelephant.ElephantRunner$1.run(ElephantRunner.java:153) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1918) at com.linkedin.drelephant.security.HadoopSecurity.doAs(HadoopSecurity.java:109) at com.linkedin.drelephant.ElephantRunner.run(ElephantRunner.java:153) at com.linkedin.drelephant.DrElephant.run(DrElephant.java:67) at java.lang.Thread.run(Thread.java:750) When I run the drelephant with hdfs it is working fine and getting the event logs data into the ui. But i want to fetch the spark-events from the local file system(or any directory not necessarily hdfs) ex- gs:// is it possible? Can you please help me in this it is a bit urgent please see through this as soon as possible.
@Javid-Shaik
- File Existence: Verify that the directory
/Users/shaikbasha/spark-events
exists and contains the Spark event logs.- Fetcher Configuration: Update the fetcher configuration in Dr. Elephant to correctly point to the local file system path. This involves editing the configuration files. For example, in
app-conf/FetcherConf.xml
, ensure theevent_log_location_uri
is set to the correct local path:- Permissions: Ensure that the user running Dr. Elephant has the necessary permissions to read the files in the specified directory.
- Configuration Files: Make sure all other necessary configurations are correctly set up as per the Dr. Elephant setup instructions
@AbdelrahmanMosly Directory is exists and spark-event logs are present in the directory. shaikbasha@C02G144RMD6M dr-elephant-2.1.7 % ls /Users/shaikbasha/spark-events | tail -n 10 spark-034778d6e9844b97b5fc4217197e0d91 spark-19ce088f2e7b4443a09b32ee1082e546 spark-46a7d8db9504453a816a6d1a98884709 spark-4a5a8432e5c7452e8638de54c8db1297 spark-6befa09607c249e2aa0fc5d2e650f814 spark-a823dfda6b6d4d7481a2f3065de0201e spark-dcf713ca380a41ffbfc578e379c50f59 FetcherConf.xml spark com.linkedin.drelephant.spark.fetchers.FSFetcher
/Users/shaikbasha/spark-events Permissions are provided shaikbasha@C02G144RMD6M dr-elephant-2.1.7 % ls -l /Users/shaikbasha/spark-events | tail -n 5 -rw-r--r-- 1 shaikbasha staff 51674335 Jun 21 10:34 spark-46a7d8db9504453a816a6d1a98884709 -rw-r--r-- 1 shaikbasha staff 87653 Jun 20 18:02 spark-4a5a8432e5c7452e8638de54c8db1297 And i have configured the spark and hadoop configurations files correctly. Please help me on this. As I have already mentioned that dr-elephant is working fine with hdfs. I want it to work with the local FS. The below error is in the dr_elephant.log file 06-24-2024 20:28:10 INFO [Thread-8] com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 : Event log directory file:///Users/shaikbasha/spark-events 06-24-2024 20:28:10 ERROR [Thread-8] com.linkedin.drelephant.ElephantRunner : Error fetching job list. Try again later... java.lang.IllegalArgumentException:Wrong FS: file:/Users/shaikbasha/spark-events
,expected: hdfs://localhost:8020
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:779)@Javid-Shaik First i recommend you to chrck this commit : eb6092b Based on the error message
Wrong FS: file:/Users/shaikbasha/spark-events, expected: hdfs://localhost:8020
, it seems that Dr. Elephant is configured to expect HDFS by default, but you're trying to fetch logs from the local file system. This discrepancy causes the error.
Check
core-site.xml
Configuration:
- Ensure that the Hadoop configuration (
core-site.xml
) is set up to handle local file system paths.- You might need to specify
fs.defaultFS
asfile:///
for the local file system. Here’s an example configuration forcore-site.xml
:<configuration> <property> <name>fs.defaultFS</name> <value>file:///</value> </property> </configuration>
Check for Hardcoded HDFS References:
- Review the Dr. Elephant source code or configurations for any hardcoded references to HDFS. Ensure that these are flexible enough to support the local file system.
Ensure Correct FileSystem Class:
- Ensure the correct FileSystem implementation is being used. For local file system,
LocalFileSystem
should be used instead of HDFS. You may need to set this explicitly in your configuration.Restart Dr. Elephant:
- After making these changes, restart Dr. Elephant to apply the new configurations.
Thank you @AbdelrahmanMosly After changing the default.Fs to file:/// in the core-site.xml I was able to get the data into the dr-elephant ui. And I observed that we should just need to start the
spark history server
no need to start themr-jobhistory-server
. But then I am getting this error in thedr.log
: Event Log Location URI: /Users/shaikbasha/spark-events [�[31merror�[0m] o.a.s.s.ReplayListenerBus - Exception parsing Spark event log: file:/Users/shaikbasha/spark-events/spark-034778d6e9844b97b5fc4217197e0d91 org.json4s.package$MappingException: Did not find value which can be converted into boolean at org.json4s.reflect.package$.fail(package.scala:96) ~[org.json4s.json4s-core_2.10-3.2.10.jar:3.2.10] [�[31merror�[0m] o.a.s.s.ReplayListenerBus - Malformed line #9: {"Event":"SparkListenerJobStart"
...... } Can you please help in resolving this error.@Javid-Shaik I believe the issue is related to differences in Spark versions. Spark 1.x, 2.x, and 3.x have variations in the event listeners they use. Dr. Elephant was originally designed for Spark 1.x and was later adapted for Spark 2.x in some pull requests. If you check my PR, you'll see that to make Dr. Elephant compatible with Spark 3.x, I had to modify the listeners. Spark 3.x introduced new listeners and removed some of the existing ones, which required adjustments in the event log parsing logic. Additionally you can check those commits 71e6f2c a1e6c67
@AbdelrahmanMosly I have done everything as you have told me to do but then I am getting this new error along with the previous error. [�[31merror�[0m] o.a.s.s.ReplayListenerBus - Exception parsing Spark event log: file:/Users/shaikbasha/spark-events/spark-46a7d8db9504453a816a6d1a98884709 scala.MatchError:
org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart
(of class java.lang.String) at org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:466) ~[org.apache.spark.spark-core_2.10-1.4.0.jar:1.4.0] at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58) ~[org.apache.spark.spark-core_2.10-1.4.0.jar:1.4.0] at org.apache.spark.deploy.history.SparkDataCollection.load(SparkDataCollection.scala:310) [com.linkedin.drelephant.dr-elephant-2.1.7.jar:2.1.7] at org.apache.spark.deploy.history.SparkFSFetcher$$anonfun$doFetchData$1.apply(SparkFSFetcher.scala:105) [com.linkedin.drelephant.dr-elephant-2.1.7.jar:2.1.7] at org.apache.spark.deploy.history.SparkFSFetcher$$anonfun$doFetchData$1.apply(SparkFSFetcher.scala:104) [com.linkedin.drelephant.dr-elephant-2.1.7.jar:2.1.7] at scala.Function1$$anonfun$andThen$1.apply(Function1.scala:55) [org.scala-lang.scala-library-2.10.4.jar:na] [�[31merror�[0m] o.a.s.s.ReplayListenerBus - Malformed line #5: {"Event":"org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart"
... }And the ui is like in the above pictures giving the wrong data. Please help me in this. @Javid-Shaik
make sure the metrics you need are present in Spark Eventlog
I am getting confused about which spark version you use as I see from this error [org.apache.spark.spark-core_2.10-1.4.0.jar:1.4.0]
you are using spark version 1.4.
this should work straightforwardly. There is no need to play with replay listeners as long as I remember
if your whole problem was just to read from local you need to change configs. There is no need to change the code
@AbdelrahmanMosly
Well I am using Spark-3.5.1
.
I have compiled the dr-elephant with the default spark version 1.4.0.
@Javid-Shaik
There are discrepancies with event logs due to differences in event types between Spark versions.
Identify Missing Events:
Customize Event Parsing:
Congratulations on getting the basic UI and some events parsed! The next step involves customizing the event parsing to ensure all necessary data is captured from Spark 3.5.1 logs.
@AbdelrahmanMosly
First of all thank you for your prompt assistance and for providing clear directions to solve the problems.
I have identified some new events that are not present in the spark-1.4.0
i.e newly added in the spark-3.5.1
.
These are the newly added events in spark-3.5.1 :
1. org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionStart
2. org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionEnd
3. org.apache.spark.sql.execution.ui.SparkListenerSQLAdaptiveExecutionUpdate
4. org.apache.spark.sql.execution.ui.SparkListenerSQLExecutionEndAdaptiveSQLMetricUpdates
5. org.apache.spark.sql.execution.ui.SparkListenerDriverAccumUpdates
6. SparkListenerJobStart in this event the structure is changed
Can you please give me the head start to update Dr. Elephant’s event parsing logic to handle the new or renamed events in Spark 3.5.1.
If needed I will share the event-structure
.
@Javid-Shaik You can check the code present in SparkDataCollection.scala. Additionally, look at the documentation of Spark's replay listener to understand how to catch those listeners.
In the worst-case scenario, you can parse the JSON of the event logs directly.
@AbdelrahmanMosly Hey I found that the events
I have observed that the events SparkListenerJobStart
and SparkListenerStageSubmitted
have undergone changes in their structure between Spark versions 1.4.0 and 3.5.1. For example, fields such as "DeterministicLevel":"DETERMINATE" are present in Spark 3.5.1 but not in Spark 1.4.0, along with several other modifications.
As you have told me I have observed the code in the SparkDataCollection.scala. but not sure what to do where to modify the code.
Could you please assist me in identifying the relevant sections of the code and provide recommendations on how to adjust the parsing logic to handle these discrepancies between the Spark versions?
For the reference purpose you can see the below event.
{"Event":"SparkListenerStageSubmitted","Stage Info":{"Stage ID":0,"Stage Attempt ID":0,"Stage Name":"reduce at SparkPi.scala:38","Number of Tasks":2,"RDD Info":[{"RDD ID":1,"Name":"MapPartitionsRDD","Scope":"{\"id\":\"1\",\"name\":\"map\"}","Callsite":"map at SparkPi.scala:34","Parent IDs":[0],"Storage Level":{"Use Disk":false,"Use Memory":false,"Use Off Heap":false,"Deserialized":false,"Replication":1},"Barrier":false,"DeterministicLevel":"DETERMINATE","Number of Partitions":2,"Number of Cached Partitions":0,"Memory Size":0,"Disk Size":0},{"RDD ID":0,"Name":"ParallelCollectionRDD","Scope":"{\"id\":\"0\",\"name\":\"parallelize\"}","Callsite":"parallelize at SparkPi.scala:34","Parent IDs":[],"Storage Level":{"Use Disk":false,"Use Memory":false,"Use Off Heap":false,"Deserialized":false,"Replication":1},"Barrier":false,"DeterministicLevel":"DETERMINATE","Number of Partitions":2,"Number of Cached Partitions":0,"Memory Size":0,"Disk Size":0}],"Parent IDs":[],"Details":"some details","Submission Time":1715204029859,"Accumulables":[],"Resource Profile Id":0,"Shuffle Push Enabled":false,"Shuffle Push Mergers Count":0},"Properties":{"spark.rdd.scope":"{\"id\":\"2\",\"name\":\"reduce\"}","resource.executor.cores":"1","spark.rdd.scope.noOverride":"true"}}```
{"Event":"SparkListenerStageSubmitted","Stage Info":{"Stage ID":0,"Stage Attempt ID":0,"Stage Name":"reduce at pi.py:39","Number of Tasks":10,"RDD Info":[{"RDD ID":1,"Name":"PythonRDD", "Parent IDs":[0],"Storage Level":{"Use Disk":false,"Use Memory":false,"Use ExternalBlockStore":false,"Deserialized":false,"Replication":1}, "Number of Partitions":10,"Number of Cached Partitions":0,"Memory Size":0,"ExternalBlockStore Size":0,"Disk Size":0},{"RDD ID":0,"Name":"ParallelCollectionRDD","Scope":"{\"id\":\"0\",\"name\":\"parallelize\"}","Parent IDs":[],"Storage Level":{"Use Disk":false,"Use Memory":false,"Use ExternalBlockStore":false,"Deserialized":false,"Replication":1}, "Number of Partitions":10,"Number of Cached Partitions":0,"Memory Size":0,"ExternalBlockStore Size":0,"Disk Size":0}],"Parent IDs":[],"Details":"","Submission Time":1458126390256,"Accumulables":[]}, "Properties":{"spark.rdd.scope.noOverride":"true","spark.rdd.scope":"{\"id\":\"1\",\"name\":\"collect\"}","callSite.short":"reduce at pi.py:39"}}
@Javid-Shaik
Identify New Fields and Events:
Locate Event Parsing Logic:
load
method in SparkDataCollection.scala
where ReplayListenerBus
is used.Modify Event Listeners:
Add Handlers for New Events:
Integrate Changes:
load
method and other relevant parts of the code.Hi @AbdelrahmanMosly I have identified the new events and fields that were added in spark-3.5.1 and removed thos events. Now the ui is showing the correct data but then I am getting this error
07-10-2024 11:37:43 ERROR [dr-el-executor-thread-0] com.linkedin.drelephant.ElephantRunner : Failed to analyze SPARK spark-05981aeb46fb4816b20a62ae2fdf6041
javax.persistence.PersistenceException: ERROR executing DML bindLog[] error [Duplicate entry 'spark-05981aeb46fb4816b20a62ae2fdf6041' for key 'yarn_app_result.PRIMARY']
at com.avaje.ebeaninternal.server.persist.dml.DmlBeanPersister.execute(DmlBeanPersister.java:97)
at com.avaje.ebeaninternal.server.persist.dml.DmlBeanPersister.insert(DmlBeanPersister.java:57)
at com.avaje.ebeaninternal.server.persist.DefaultPersistExecute.executeInsertBean(DefaultPersistExecute.java:66)
at com.avaje.ebeaninternal.server.core.PersistRequestBean.executeNow(PersistRequestBean.java:448)
at com.avaje.ebeaninternal.server.core.PersistRequestBean.executeOrQueue(PersistRequestBean.java:478)
at com.avaje.ebeaninternal.server.persist.DefaultPersister.insert(DefaultPersister.java:335)
at com.avaje.ebeaninternal.server.persist.DefaultPersister.saveEnhanced(DefaultPersister.java:310)
at com.avaje.ebeaninternal.server.persist.DefaultPersister.saveRecurse(DefaultPersister.java:280)
at com.avaje.ebeaninternal.server.persist.DefaultPersister.save(DefaultPersister.java:248)
at com.avaje.ebeaninternal.server.core.DefaultServer.save(DefaultServer.java:1568)
at com.avaje.ebeaninternal.server.core.DefaultServer.save(DefaultServer.java:1558)
at com.avaje.ebean.Ebean.save(Ebean.java:453)
at play.db.ebean.Model.save(Model.java:91)
at com.linkedin.drelephant.ElephantRunner$ExecutorJob$1.run(ElephantRunner.java:399)
at com.avaje.ebeaninternal.server.core.DefaultServer.execute(DefaultServer.java:699)
at com.avaje.ebeaninternal.server.core.DefaultServer.execute(DefaultServer.java:693)
at com.avaje.ebean.Ebean.execute(Ebean.java:1207)
at com.linkedin.drelephant.ElephantRunner$ExecutorJob.run(ElephantRunner.java:397)
at com.linkedin.drelephant.priorityexecutor.RunnableWithPriority$1.run(RunnableWithPriority.java:36)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.sql.SQLIntegrityConstraintViolationException: Duplicate entry 'spark-05981aeb46fb4816b20a62ae2fdf6041' for key 'yarn_app_result.PRIMARY'
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:118)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:912)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1054)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdateInternal(ClientPreparedStatement.java:1003)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeLargeUpdate(ClientPreparedStatement.java:1312)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeUpdate(ClientPreparedStatement.java:988)
at com.jolbox.bonecp.PreparedStatementHandle.executeUpdate(PreparedStatementHandle.java:205)
at com.avaje.ebeaninternal.server.type.DataBind.executeUpdate(DataBind.java:55)
at com.avaje.ebeaninternal.server.persist.dml.InsertHandler.execute(InsertHandler.java:134)
at com.avaje.ebeaninternal.server.persist.dml.DmlBeanPersister.execute(DmlBeanPersister.java:86)
... 23 more
Please help me in resolving this error.
@Javid-Shaik i dont remember encountering this error but simply you can check for any duplicate as this error indicates
@AbdelrahmanMosly It occurs when I restarts the Dr Elephant server. For the first time no error only when I restarts it then this error is occurring.
And also can you please tell me how to get JobExecution Url, Flow Execution Url, Job Definition Url etc. Currently I am getting only the spark history server url of the spark job on the dr elephant ui but not the rest of the urls.
And also is it possible to analyze the streaming jobs.
Currently DrElephant analyzes the batch jobs i.e the event logs of already completed application if it is please give me a lead on how to analyze the streaming jobs.
@Javid-Shaik For the duplicate entry error, it’s likely due to something in the Dr. Elephant database, such as a job being recorded twice. You might need to identify and delete the duplicate entries in the database to resolve this issue.
Regarding the URLs, you need to ensure that your configuration includes the scheduler URLs to integrate properly. Here's an example of the configuration you should add:
azkaban.execution.url=<your-azkaban-execution-url>
oozie.base.url=<your-oozie-base-url>
airflow.base.url=<your-airflow-base-url>
These configurations are necessary because Spark event logs alone are not sufficient for this task.
@AbdelrahmanMosly Does this mean that the spark jobs need be submitted via a scheduler?
And also please tell me whether it is possible to analyze the streaming jobs.
@Javid-Shaik
Spark Jobs and Schedulers: Spark jobs do not need to be submitted via a scheduler. However, integrating with a scheduler helps in obtaining detailed job metadata and URLs.
Analyzing Streaming Jobs: While Dr. Elephant primarily analyzes batch jobs, it is possible to analyze streaming jobs with some additional effort. Here are some suggestions:
Custom Metrics: Implement custom metrics in your streaming application to emit data that can be monitored and analyzed. These metrics can be pushed to a monitoring system compatible with Dr. Elephant.
Periodic Snapshots: Configure your streaming jobs to periodically write state snapshots or logs that Dr. Elephant can process. This can help in capturing the state and performance of the streaming job over time.
Use Monitoring Tools: Integrate your streaming jobs with monitoring tools like Prometheus or Grafana. These tools can provide real-time insights, and you can correlate this data with Dr. Elephant’s batch analysis to get a more comprehensive view.
Extend Dr. Elephant: If you have the capability, you can modify and extend Dr. Elephant to better support streaming jobs. This would involve capturing and analyzing metrics specific to streaming workloads.
Hybrid Approach: Use a combination of the above methods to gather insights into the performance and efficiency of your streaming jobs.
I haven't personally gone down this path as it wasn’t required in my case, so I don't have direct experience with these methods. However, these suggestions should help you get started.
@AbdelrahmanMosly Thank you, AbdelRahman, for your invaluable guidance. I truly appreciate your help.
@Javid-Shaik You're welcome! Good luck with your work on Dr. Elephant.
PR #357: Uncompressed File Support for Dr Elephant
Using Local Event Logs
Initially, Dr Elephant utilized the YARN Resource Manager to check submitted jobs. However, we made modifications to read from local Spark event logs instead.
If the environment variable
USE_YARN
is set totrue
, Dr Elephant will still be able to use the YARN Resource Manager. In this case, it will read and check the logs from the history server of Hadoop (YARN Resource Manager).Using Uncompressed Files
Dr Elephant originally processed compressed files using codec. We enhanced it to support the reading of uncompressed files.
Spark and Hadoop Versions
Dr Elephant is designed to run on Spark 1.4.0 and Hadoop 2.3.0. However, issues arose when attempting to read event logs generated from Spark 3, as a new listener was introduced that couldn't be identified using the
ReplayListenerBus
of Spark version 1.4.0. To address this, we implemented a workaround, neglecting the listener namedSparkListenerResourceProfileAdded
.Fetchers Configuration
We identified the Spark event logs directory and disabled the Tez Fetcher in the
FetcherConf.xml
configuration.