nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
637 stars 116 forks source link

Ubuntu Support & user configurable wait/sleep time #222

Closed DebPanigrahi closed 6 years ago

DebPanigrahi commented 6 years ago

Thank you for a great product.

Tried launching clusters (ubuntu AMI) failed launches many times. So upon investigation into the code these were the bottle-necks:

  1. waiting to ssh to clusters (3 secs is way too less.) Can it be given from command/config?
  2. It assumes that python already exists to download hadoop. Won't work if python does not exist. Could you please allow user given set of command we can pre-execute for dependencies.
  3. assumes "yum install/remove" can we make this user-driven optionally (to support apt install/remove)

still ran into issues for java based dependencies in ubuntu, so I dropped that route. But for Amazon linux at least user-driven wait-time and number of tries would be helpful.

thank you Deb

nchammas commented 6 years ago

Hi @pani6me!

  1. Can you elaborate on what the issue is here? Flintrock polls for SSH availability every few seconds so it can start work as soon as possible. I don't understand what the problem is or why someone would want to control the polling interval.
  2. Flintrock works best with Amazon Linux, but if you want to use an AMI that doesn't have Python installed by default, you can install Python as part of instance launch using the --ec2-user-data option.
  3. Ubuntu support is tracked in #95. My comment there still captures my attitude towards Ubuntu: I'd support it if it didn't add much complexity or maintenance burden. But I doubt that is possible, which is why my default position is not to expand Flintrock's support matrix.
DebPanigrahi commented 6 years ago

Thanks for the info @nchammas

  1. No longer an issue when I used amz-linux.
  2. I agree supporting ubuntu would be a constant support overhead, especially custom built AMIs. --ec2-user-data is a great concept. If you could extend the concept of --ec2-user-data to include the install file for master and slave separately, then that would give the responsibility/power on the users to specify the correct commands for any linux system. I'll probably make this temporary hack for my use in local area. Then flintrock can put the installation logs for master and slave in some tmp area without reading into the logs/runs success/failure.
  3. If above options for master/slave are available ubuntu support may not be needed.
nchammas commented 6 years ago

I've reported #223 to capture the feature request for separate --ec2-user-data options.