nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
636 stars 116 forks source link

Support different instance type of slave #241

Closed wellshs closed 4 years ago

wellshs commented 6 years ago

Hello, I'm Hyunsuk

This PR makes the following changes:

In our use case, we create slave instance when we needed while master instance always alive. But, master slave utilization is low when doesn't work with slave, we want to give cheaper type to master instance.

I tested this PR by our team usage & pytest_static.

Any feedback or comments are welcome Thanks!

nchammas commented 6 years ago

Thanks for submitting this PR @wellshs. Some questions and comments for you:

  1. Over on https://github.com/nchammas/flintrock/pull/166#issuecomment-274999175 we were discussing the possibility of just colocating the master on one of the slave nodes. Is that something you considered?
  2. The current PR doesn't add any new options to the launch command. If we're going to allow the master and slaves to have different instance types, then launch should support that too.
  3. Any new options we add should have matching lines in the config template.
  4. Without any strict control over what instance types users can pick (e.g. via a manifest), users can end up with several instance types mixed in the same cluster. The author of #166 reported that just mixing the instance types across the master and slaves caused some issues with HDFS. Have you tested mixed instance type clusters with HDFS?

Related: #166 and #199.

wellshs commented 6 years ago

Thanks for comment @nchammas.

Now I realize that many other issues related to this PR

For your question & comments A1. I didn't consider to use one node as master & slave. I think this option is very good. For our use case, sometime we have to attach one slave to use spark, even master node have resources.

Comment 2, 3. A2,3. Oh, I didn't consider for launch option. Just thinking our use case. I'm inexperienced because this is my first time to contribute open source project.

A4. No I just tested for same instance type.

xiandong79 commented 6 years ago

Really useful option, especially for the assumption of "heterogeneous" cluster!

nchammas commented 6 years ago

@wellshs - No worries about this being your first contribution! As you can see, this issue is a bit more involved than it looks.

I think if we want to allow for separate instance types across the master and slaves, we need to address points 2, 3, and 4 from my earlier comment.

Otherwise, we should go with the approach described in #166 of colocating the master on one of the slaves.

wellshs commented 6 years ago

@nchammas I think colocating the master on one of the slaves doesn't fit when other scheduler and spark master on one instance.

In my use case, scheduler and spark master in the same instance. Therefore instance work when scheduler trigger the works, which spark doesn't needed. Also, in my experience whenever slave instance type are same(different with master), I don't experience problem(address point 4). Maybe, allowing different instance type for master and slave. And keep slave instance type must be same can be a solution.

I like colocating idea and sometimes it will be great when only one slave is needed.

nchammas commented 4 years ago

Closing this PR per our discussion. I need to find a better way to enable popular feature requests like this. Sorry about that @wellshs!