saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.13k stars 5.47k forks source link

Utilize Boto3 get_waiters() in salt.modules.boto_* #37026

Closed pedrohdz closed 6 years ago

pedrohdz commented 7 years ago

Description of Issue/Question

I am running into all sorts of race conditions while implementing a formula utilizing the boto_* states. I looked through the code and only found one instance of get_waiter() being called, it's in salt/modules/boto_s3_bucket.py.

Maybe I'm missing something here? :-)

Here's an example. Unable to delete subnet after terminating an EC2 instance. demo00.local is in subnet0, and the only server in that subnet. The following:

# ... stuff truncated ...

EC2 demo server absent:
  boto_ec2.instance_absent:
    - name: demo00.local

Subnet subnet0 absent:
  boto_vpc.subnet_absent:
    - name: subnet0
    - require:
      - boto_ec2: EC2 demo server absent

Will yield something like:

----------
          ID: Subnet subnet0 absent
    Function: boto_vpc.subnet_absent
        Name: subnet0
      Result: False
     Comment: Failed to delete subnet: Bad Request: The subnet 'subnet-63aa8c15' has dependencies and cannot be deleted.
     Started: 07:25:03.078485
    Duration: 943.968 ms
     Changes:

Summary for local
------------
Succeeded: 2 (changed=2)
Failed:    1
------------

It seems that SaltStack tries to delete the subnet before the instance has been terminated. A get_wait() might be useful here. Maybe make it an option?

I have been seeing all sorts of timing related inconsistencies when trying to use SaltStack boto_* states for managing AWS. Another example is trying to create an EC2 instance immediately after creating a IAM Role for it. boto_iam_role.present returns immediately, but the boto_ec2.instance_present starts before the IAM role is actually created. This gives the following error:

----------
          ID: EC2 demo server exists
    Function: boto_ec2.instance_present
        Name: demo00.demo-saltstack-vpc.local
      Result: False
     Comment: An exception occurred in this state: Traceback (most recent call last):
                File "/usr/lib/python2.7/dist-packages/salt/state.py", line 1733, in call
                  **cdata['kwargs'])
                File "/usr/lib/python2.7/dist-packages/salt/loader.py", line 1652, in wrapper
                  return f(*args, **kwargs)
                File "/usr/lib/python2.7/dist-packages/salt/states/boto_ec2.py", line 783, in instance_present
                  region=region, key=key, keyid=keyid, profile=profile)
                File "/usr/lib/python2.7/dist-packages/salt/modules/boto_ec2.py", line 826, in run
                  network_interfaces=interfaces)
                File "/usr/lib/python2.7/dist-packages/boto/ec2/connection.py", line 973, in run_instances
                  verb='POST')
                File "/usr/lib/python2.7/dist-packages/boto/connection.py", line 1208, in get_object
                  raise self.ResponseError(response.status, response.reason, body)
              EC2ResponseError: EC2ResponseError: 400 Bad Request
              <?xml version="1.0" encoding="UTF-8"?>
              <Response><Errors><Error><Code>InvalidParameterValue</Code><Message>Value (DEMO-SALTSTACK-MASTER) for parameter iamInstanceProfile.name is invalid. Invalid IAM Instance Profile name</Message></Error></Errors><RequestID>8d31f575-f6b1-4171-8896-f412677e702b</RequestID></Response>
     Started: 08:19:47.616027
    Duration: 474.711 ms
     Changes:

Summary for local
-------------
Succeeded: 14 (changed=14)
Failed:     1
-------------

Given that these are timing related it can be hit or miss.

Now I am able to work around this by waiting a minute or two then rerunning the command. I even hack dependency orders. It works, but seems like using get_waiter() and waiting on AWS events to complete would be cleaner.

Setup

Please review above...

Steps to Reproduce Issue

Please review above...

Versions Report

All versions as far as I can tell.

pedrohdz commented 7 years ago

Here's another example I just ran into. When trying to create a subnet, it seems like _create_resource() in salt/modules/boto_vpc.py tries to assign a name to the new resource, a subnet in this cace. A race condition is causing the following error:

[INFO    ] Running state [DEMO-SALTSTACK-VPC-subnet0] at time 09:34:40.905079
[INFO    ] Executing state boto_vpc.subnet_present for DEMO-SALTSTACK-VPC-subnet0
[INFO    ] Subnet DEMO-SALTSTACK-VPC-subnet0 does not exist.
[INFO    ] Matching VPC: vpc-5957b13e
[INFO    ] A subnet with id subnet-5628b60e was created
[ERROR   ] Failed to create subnet: Bad Request: The subnet ID 'subnet-5628b60e' does not exist
[INFO    ] Completed state [DEMO-SALTSTACK-VPC-subnet0] at time 09:34:41.821020 duration_in_ms=915.941
local:
  Name: DEMO-SALTSTACK-MASTER - Function: boto_iam_role.present - Result: Changed Started: - 09:34:32.656596 Duration: 3532.166 ms
  Name: DEMO-PUBLIC_KEY - Function: boto_ec2.key_present - Result: Changed Started: - 09:34:36.194447 Duration: 1005.775 ms
  Name: DEMO-SALTSTACK-VPC - Function: boto_vpc.present - Result: Changed Started: - 09:34:37.203591 Duration: 1639.402 ms
  Name: DEMO-SALTSTACK-VPC - Function: boto_vpc.internet_gateway_present - Result: Changed Started: - 09:34:38.843ion: 1023.9 ms
  Name: DEMO-SALTSTACK-VPC - Function: boto_vpc.dhcp_options_present - Result: Changed Started: - 09:34:39.867404 Duration: 1037.417 ms
----------
          ID: Subnet subnet0 exists
    Function: boto_vpc.subnet_present
        Name: DEMO-SALTSTACK-VPC-subnet0
      Result: False
     Comment: Failed to create subnet: Bad Request: The subnet ID 'subnet-5628b60e' does not exist
     Started: 09:34:40.905079
    Duration: 915.941 ms
     Changes:

Summary for local
------------
Succeeded: 5 (changed=5)
Failed:    1
------------
Total states run:     6
Total run time:   9.155 s

I look in the AWS console and see that the subnet was created, but it was not assigned the name. My guess is that either _maybe_set_name_tag() or _maybe_set_tags() failed due to a race condition.

I have not been able to reproduce this one.

Ch3LL commented 7 years ago

Okay so we need to make sure we narrow down this issue so we know when it can be closed. From what I can see in this issue so far you are running into race conditions for the following:

Just to clarify all of these situations above do not happen consistently correct?

pedrohdz commented 7 years ago

The first two items can be reproduced on a fairly regular basis, assuming the calls happen in immediate succession.

The third, and final, happens very irregularly.

Ch3LL commented 7 years ago

thanks for the clarification. We will need to get this fixed up. Thanks

pedrohdz commented 7 years ago

Thank you!

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.