nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
636 stars 116 forks source link

Add an option to wait for cluster destruction #229

Closed alexandrnikitin closed 6 years ago

alexandrnikitin commented 6 years ago

At the moment flintrock doesn't wait for instances termination. It could be useful to add an option (or even force it) to wait for instances to end up at terminated state.

nchammas commented 6 years ago

I think this could be helpful, but to help motivate the issue could you provide some use cases where someone would want to wait for cluster termination?

alexandrnikitin commented 6 years ago

@nchammas Thank you for your reply! Yes, I have a particular use case. There is a limit on number of running instances on AWS. There is a job that destroys the cluster after it finishes. There's another job that starts right after the second, but it can't launch a cluster because there are still instances in terminating state, and total number exceeds the limit. As a workaround: a sleep and additional checks on launch help. I was thinking about the wait_for_state('terminated)` call by default: https://github.com/alexandrnikitin/flintrock/commit/c8d23c807eed59089c12a5e5d2f83613b35d7c1b

nchammas commented 6 years ago

Makes sense to me. Perhaps we could put this behind an option:

flintrock destroy --wait <cluster_name>

And leave the default destroy behavior as-is.

What do you think?

alexandrnikitin commented 6 years ago

I don't have strong opinion on which way is better. Intuitively I think that waiting by default is better with --no-wait option. Do you think it needs a timeout option in addition?

nchammas commented 6 years ago

After thinking a bit more about this, I agree that it's better to have the default behavior be to wait and add a --no-wait option.

By waiting by default, you can chain actions after the destroy without having to cobble together sleep statements or polling for status. And it also fits the semantics of the operation--destroy won't return until the cluster is fully destroyed.

For simplicity, I don't think we need a timeout to start.

Would you like to submit a PR for this?

alexandrnikitin commented 6 years ago

I'm happy to. Here it is #231 I've been thinking about this, and I think it's fine to wait_for_state('terminated) by default. It's consistent with other commands: launch, stop, start.