linkedin / dr-elephant

Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
Apache License 2.0
1.35k stars 858 forks source link

[FEEDBACK] Request for response to contributor/user survey #685

Closed ShubhamGupta29 closed 4 years ago

ShubhamGupta29 commented 4 years ago

We are increasing focus on Dr. Elephant and the community of contributors and users. Our immediate goals are to support the latest versions of Spark. Over the last year, we have noticed that Dr. Elephant is being used more in cloud platforms like AWS and Azure. The following issues track these efforts:

  1. Support for Spark 2.3/2.4 in Dr.Elephant
  2. Support for Hadoop 3
  3. Installation instructions for AWS EMR and Azure HDInsight

We also want to know from you how you are using Dr. Elephant as well as how Dr. Elephant and the community can be improved. Can you please respond to this survey? Your responses will help us prioritize features.

If we have missed something in the survey please let us know in this issue or in the last question of the survey.

@astahlman @xglv1985 @mareksimunek @sri840 @tooptoop4

ShubhamGupta29 commented 4 years ago

@mareksimunek @xglv1985 Thanks for filling out the survey. We have added one more question to the survey: Do you prefer to install Dr.Elephant using Docker containers, kindly provide your preference for using docker installation for Dr.Elephant.

tooptoop4 commented 4 years ago

@ShubhamGupta29 will there be support for spark standalone? (no yarn)

xglv1985 commented 4 years ago

@mareksimunek @xglv1985 Thanks for filling out the survey. We have added one more question to the survey: Do you prefer to install Dr.Elephant using Docker containers, kindly provide your preference for using docker installation for Dr.Elephant.

Yes, I prefer Docker containers, especially when I have more than one yarn clusters to be tracked by Dr.Elephant. But I also hope non-Docker Dr.elephant can be reserved, to be compatible with the functionality of our online Dr.Elephant

mareksimunek commented 4 years ago

@ShubhamGupta29 yes Docker is preferred way. Or ansbile playbook. How are you installing Dr.elephant at Likedin?

ShubhamGupta29 commented 4 years ago

@ShubhamGupta29 will there be support for spark standalone? (no yarn)

@tooptoop4 can you provide details about your use case. Using Dr.Elephant for standalone jobs seems like a overkill.

ShubhamGupta29 commented 4 years ago

@ShubhamGupta29 yes Docker is preferred way. Or ansbile playbook. How are you installing Dr.elephant at Likedin?

@mareksimunek we are installing the same way as mentioned in the documentation.

tooptoop4 commented 4 years ago

@ShubhamGupta29 i dont have emr/cloudera just hive/spark/s3

shkhrgpt commented 4 years ago

@ShubhamGupta29 Thanks for starting this thread. Improving Spark support is much needed. However, I have one concern. Are we still going to have customSHSWork branch? I am asking this because any changes done in customSHSWork are not useful for rest of the community because we don't have access to the custom Spark history server which LinkedIn uses.

ShubhamGupta29 commented 4 years ago

@ShubhamGupta29 Thanks for starting this thread. Improving Spark support is much needed. However, I have one concern. Are we still going to have customSHSWork branch? I am asking this because any changes done in customSHSWork are not useful for rest of the community because we don't have access to the custom Spark history server which LinkedIn uses.

For LinkedIn we will continue to use customSHSWork, but for OpenSource we will be adding all the new changes to master branch itself as now master will the branch in focus for all the new development keeping OS community in mind.

ShubhamGupta29 commented 4 years ago

@shkhrgpt and @tooptoop4 there is a survey for feedback related to Dr.Elephant, kindly fill that survey and provide any feature you would like to have in Dr.Elephant. survey: https://forms.gle/Fb956VQuyXREvfmM6

ShubhamGupta29 commented 4 years ago

Closing this thread as a similar thread is in place #687

ShubhamGupta29 commented 4 years ago

@ShubhamGupta29 i dont have emr/cloudera just hive/spark/s3

What's the RM you are using? @tooptoop4 better to mention this in the survey and you can create a separate thread for this so we can understand the requirements well and provide the support you need from the Dr.Elephant's end.

theyaa commented 4 years ago

I am trying to integrate Dr. Elephant with HDP3(Hadoop 3 and Hive 3). And I am looking at the following features.