nchammas / flintrock

A command-line tool for launching Apache Spark clusters.
Apache License 2.0
636 stars 116 forks source link

Spot instance tags issue #364

Closed maxpoulain closed 7 months ago

maxpoulain commented 8 months ago

Hi !

It seems that there is an issue regarding TagSpecifications in Make finding cluster nodes more reliable in the face of launch failures. As we are interested in this new feature, we tried to make a release in our forked version but we got this issue:

Parameter validation failed:
Unknown parameter in LaunchSpecification: "TagSpecifications", must be one of: SecurityGroupIds, SecurityGroups, AddressingType, BlockDeviceMappings, EbsOptimized, IamInstanceProfile, ImageId, InstanceType, KernelId, KeyName, Monitoring, NetworkInterfaces, Placement, RamdiskId, SubnetId, UserData

If we are not wrong aout this issue, it seems that this issue comes from TagSpecifications in common_launch_specs that is a parameter of request_spot_instances and not a key inside LaunchSpecification dict as you can see here. Also, we can't put ResourceType': 'instance' in client.request_spot_instances as explained in the documentation:

TagSpecifications (list) --
The key-value pair for tagging the Spot Instance request on creation. The value for ResourceType must be spot-instances-request , otherwise the Spot Instance request fails. To tag the Spot Instance request after it has been created, see CreateTags.

So, I think, we need to put ResourceType': 'spot-instances-request' if we want to tag the spot request instance (not spot instance), if not we can remove TagSpecifications from spot request instance.

Here is a possible solution in _create_instances:

  1. Remove TagSpecifications from common_launch_specs
  2. Add TagSpecifications=[{'ResourceType': 'spot-instances-request', 'Tags': tag_specifications}], in client.request_spot_instances
  3. Add TagSpecifications=[{'ResourceType': 'instance', 'Tags': tag_specifications}], in ec2.create_instances
  4. Add tags to spot instances using create_tags as last step inside if spot_price::
    (ec2.instances
             .filter(
                Filters=[
                    {'Name': 'instance-id', 'Values': [instance.id for instance in cluster_instances]}
                ])
             .create_tags(Tags=tag_specifications))

We have done these modifications in our fork, so if you're okay with that, we can do the same here or if you have maybe a more elegant way to do that it could be interesting.

I hope it's clear and this helps !

Maxime

nchammas commented 8 months ago

Thanks for the clear report. I appreciate it.

I did not think to test spot instances as I worked on #362. Apologies.

I would welcome a PR to fix this issue. However, I would not want to re-introduce tagging as a separate step, as you proposed in step (4) above. This recreates the original problem of cluster nodes becoming orphaned if something goes wrong between when they are launched and when they are tagged.

We want tagging to happen as part of the same AWS API call that creates the instances, whether they are spot or not.

Is that not possible with spot instances? i.e. Are tags for spot instance requests not automatically also tags for the spot instances themselves?

maxpoulain commented 8 months ago

Hi,

Ok it's more clear now why you don't use create_tags anymore.

But when we tried to launch spot instances with TagSpecifications using request_spot_instances, the request is tagged but instances doesn't inherit from tags of the request. So it seems, it's not possible to tag spot instances during the same call that creates instances.

nchammas commented 8 months ago

Looks like the boto3 method we use to launch spot instances is deprecated, and the solution is to migrate off that method.

We strongly discourage using the RequestSpotInstances API because it is a legacy API with no planned investment.

maxpoulain commented 8 months ago

Oh actually I hadn't seen that ! If we follow AWS recommandation to launch spot instances Which is the best Spot request method to use?, it seems that we need to use create_fleet. I don't know exactly what is the impact of this change.. But regarding our Tag problem, it seems that create_fleet support instance tagging, so we could have a single AWS API call that launches the instances and tag them as described in the documentation:

TagSpecifications (list) –

The key-value pair for tagging the EC2 Fleet request on creation. For more information, see [Tag your resources](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html#tag-resources).

If the fleet type is instant, specify a resource type of fleet to tag the fleet or instance to tag the instances at launch.

If the fleet type is maintain or request, specify a resource type of fleet to tag the fleet. You cannot specify a resource type of instance. To tag instances at launch, specify the tags in a [launch template](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-launch-templates.html#create-launch-template).
nchammas commented 8 months ago

I thought create_instances is the API we need to use, actually. We are already using it to launch on-demand instances. It just needs different parameters to configure it for spot instances. That way we are using a single launch API across Flintrock. (Note that RunInstances from the AWS docs maps to create_instances in boto3.)

That was my plan, but I only took a quick look and plan to come back to this later. I think in the future we could migrate from create_instances to create_fleet if we want to support launching clusters with different master/slave instance types, but that's not required for now.

If you want to take a crack at this, go ahead. Otherwise, I will try to get to this next week.