nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
638 stars 116 forks source link

Cannot destroy cluster with missing master #113

Closed sgvandijk closed 8 years ago

sgvandijk commented 8 years ago

It should be possible to destroy a cluster where the master instance is missing. Currently, this gives the following error:

$ flintrock destroy mycluster
Traceback (most recent call last):
  File "/Users/me/PyEnvs/flintrock/bin/flintrock", line 11, in <module>
    sys.exit(main())
  File "/Users/me/PyEnvs/flintrock/lib/python3.5/site-packages/flintrock/flintrock.py", line 871, in main
    cli(obj={})
  File "/Users/me/PyEnvs/flintrock/lib/python3.5/site-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/Users/me/PyEnvs/flintrock/lib/python3.5/site-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/Users/me/PyEnvs/flintrock/lib/python3.5/site-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/me/PyEnvs/flintrock/lib/python3.5/site-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/me/PyEnvs/flintrock/lib/python3.5/site-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/Users/me/PyEnvs/flintrock/lib/python3.5/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/me/PyEnvs/flintrock/lib/python3.5/site-packages/flintrock/flintrock.py", line 373, in destroy
    vpc_id=ec2_vpc_id)
  File "/Users/me/PyEnvs/flintrock/lib/python3.5/site-packages/flintrock/ec2.py", line 675, in get_cluster
    vpc_id=vpc_id)
  File "/Users/me/PyEnvs/flintrock/lib/python3.5/site-packages/flintrock/ec2.py", line 720, in get_clusters
    for cluster_name in found_cluster_names]
  File "/Users/me/PyEnvs/flintrock/lib/python3.5/site-packages/flintrock/ec2.py", line 720, in <listcomp>
    for cluster_name in found_cluster_names]
  File "/Users/me/PyEnvs/flintrock/lib/python3.5/site-packages/flintrock/ec2.py", line 771, in _compose_cluster
    (master_instance, slave_instances) = _get_cluster_master_slaves(instances)
  File "/Users/me/PyEnvs/flintrock/lib/python3.5/site-packages/flintrock/ec2.py", line 759, in _get_cluster_master_slaves
    raise Exception("No master found.")
Exception: No master found.

This happened to me for instance when using spot instances on EC2 and just my master was terminated when the spot price exceeded my bid.

This error also happens with describe, and I imagine any call into _get_cluster_master_slaves, where some may want to handle this differently than others.

nchammas commented 8 years ago

Thanks for the report. This is a valid issue.

From a user perspective, perhaps a friendlier thing to do is just issue a warning if the master is missing but continue anyway for operations like describe, destroy, and stop. For other commands, we should error out.

Does that seem reasonable?

Going forward, by the way, I would really like to change the spot behavior so that only the slaves get launched as spot instances (#82) to avoid situations like this. That work is dependent on #16, which I am working on now.

Actually, if I can get #16 out soon and just change the spot behavior as described, then we won't need to change _get_cluster_master_slaves().

sgvandijk commented 8 years ago

With that spot behavior the problem I had wouldn't have happened yeah. But it's probably still nice to handle this; on one hand I'd want somebody who accidentally terminates my master to go through the pain of cleaning up everything by hand, but not if that idiot is me ;)

nchammas commented 8 years ago

I think you're right. Probably the most practical approach is just to issue a clear warning and try to do what the user wants.