linkedin / dr-elephant

Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Apache License 2.0
1.36k stars 859 forks source link

Not able to compile Dr. Elephant on spark2.4.4 #639

Open CalvinNeo opened 4 years ago

CalvinNeo commented 4 years ago

I am trying to compile Dr. Elephant on my mac. I already installed the following dependencies:

  1. spark 2.4.4
  2. hadoop 2.7.7
  3. activator 1.3.12

However, when I sudo sh ./compile.sh ./compile.conf, it prints

[error] (*:update) sbt.ResolveException: unresolved dependency: org.apache.spark#spark-core_2.10;2.4.4: not found
ShubhamGupta29 commented 4 years ago

@CalvinNeo can you provide your Dependency.scala? Seems like the wrong dependency mention issue.

jilugulu commented 4 years ago

Hi, I was wondering if your spark2.4.4 compiled successfully?

CalvinNeo commented 4 years ago

@jilugulu I did not compile Spark2.4.4, I download it from https://spark.apache.org/downloads.html.

@ShubhamGupta29 How can I locate Dependency.scala?

ShubhamGupta29 commented 4 years ago

It's under /project folder. @CalvinNeo can you help me with the changes you have made till now for using higher version of Spark.

CalvinNeo commented 4 years ago

@ShubhamGupta29 This is my Dependency.scala. Dependencies.scala.zip

I modified some conf files including change spark-default.conf

spark.master                     spark://localhost:7077
spark.eventLog.enabled           false
spark.eventLog.dir               hdfs://localhost:9000/user/spark/appHist
spark.history.fs.logDirectory      /.../spark-2.4.4-bin-hadoop2.7/conf/history/spark-events

I also export SPARK_MASTER_HOST=localhost, otherwise it will be name of my computer's name, which can't be identified by Workers.

jilugulu commented 4 years ago

Did you successfully compile Dr elephant on spark2.4.4?

ShubhamGupta29 commented 4 years ago

@CalvinNeo you are facing this issue because Dr.Elephant is trying to fetch spark-core_2.10:2.4.4 but Spark2.4.4 is not available with Scala 2.10. Spark2.2 is the highest version which is compatible with Scala 2.10.

CalvinNeo commented 4 years ago

@ShubhamGupta29 Actually, I run echo $SCALA_HOME and it shows my scala is 2.13.1

ShubhamGupta29 commented 4 years ago

What's the value of scalaVersion in build.sbt?

CalvinNeo commented 4 years ago

@ShubhamGupta29 2.10.4, and I didn't modify this file

CalvinNeo commented 4 years ago

I read some introduction of drelephant, to my opinion, it requires metric from Spark/Hadoop. As far as I know, spark's eventlog is a JSON file, so I can just download and parse it everywhere, so why should I first compile that and then copy it to some machines which have Hadoop and Spark?

luoze-god commented 4 years ago

look at this author githup: https://github.com/BruceXu1991/dr-elephant, he slove this problem ~ dr.elephant on spark 2.4.X

pattans commented 4 years ago

I was able to make Spark 2.4 work. Only thing I modified is on FetcherConf.xml ,like below, after the deployment. No changes on any versions during build (compile.sh).

--- FetcherConf.xml

  <fetcher>
    <applicationtype>spark</applicationtype>
    <classname>com.linkedin.drelephant.spark.fetchers.SparkFetcher</classname>
    <params>
      <use_rest_for_eventlogs>true</use_rest_for_eventlogs>
      <should_process_logs_locally>true</should_process_logs_locally>
    </params>
    <params>
      <event_log_size_limit_in_mb>500</event_log_size_limit_in_mb>
      <event_log_location_uri>webhdfs://<Name Node Server>:50070/user/spark/spark2ApplicationHistory</event_log_location_uri>
  </params>
  </fetcher>