linkedin / dr-elephant

Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Apache License 2.0
1.35k stars 855 forks source link

Dr. Elephant can't access JobHistory server and ResourceManager if they configured to use HTTPS #394

Open OleksiiDuzhyi opened 6 years ago

OleksiiDuzhyi commented 6 years ago

Hi,

I'm trying to set Dr.Elephant for our cluster. Here is a piece of configuration: mapred-site.xml

  <property>
    <name>mapreduce.jobhistory.http.policy</name>
    <value>HTTPS_ONLY</value>
  </property>
  <property>
    <name>mapreduce.jobhistory.webapp.https.address</name>
    <value>HOST:PORT</value>
  </property>

yarn-site.xml

  <property>
    <name>yarn.resourcemanager.ha.rm-ids</name>
    <value>rm1,rm2</value>
  </property>
  <property>
    <name>yarn.http.policy</name>
    <value>HTTPS_ONLY</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address.rm1</name>
    <value>HOST1:PORT</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address.rm2</name>
    <value>HOST2:PORT</value>
  </property>

But classes like com.linkedin.drelephant.analysis.AnalyticJobGeneratorHadoop2 and com.linkedin.drelephant.mapreduce.fetchers.MapReduceFetcherHadoop2 ignores that these services might be using HTTPS and not HTTP. Does it make sense to add support for HTTPS?

arpang commented 5 years ago

Yes, it does IMHO. @akshayrai any comments?

arpang commented 5 years ago

I have created a pull request here: https://github.com/linkedin/dr-elephant/pull/401