schubergphilis / awsapilib

A python library exposing services that are not covered by the official boto3 library but are driven by undocumented APIs.
MIT License
60 stars 8 forks source link

Added wait_for_controltower parameter to create_account #25

Closed nikoizs closed 2 years ago

nikoizs commented 2 years ago

Added wait_for_controltower parameter to the "create_account" method, that gives the choice to skip the wait for control tower (to pick up the account creation process that was started by service catalog), in cases where it's not necessary and execution time is a limiting factor.

costastf commented 2 years ago

The issue with this change is that control tower is absolutelly uniprocess for any creating process. If the wait for service catalog is removed, then we potentially create a race condition were upon creating an account, before the service catalog starts running, so there is no way for the library to know that something is executing you can request a creation of another account or an OU which will subsequently fail leaving the creationg requesting process in an inconsistent state without any feedback about the failure either. The wait for the service catalog should not be more that 1 or 2 seconds at most so I do not expect it to be such an issue. Thoughts?

costastf commented 2 years ago

To elaborate a bit, the library is trying to check if control tower is busy with some heuristics that can be seen here. It is important that if any of those sitiuations occur no active process is tried because it will fail. This decorator makes sure that any action that is active is allowed only if control tower is not busy already and if it is it will raise an exception. For this mechanism to work properly we need to make sure that no action is executed without the process being able to identify if one of the busy heuristics are active. When you create an account a request is made to the control tower control plane that in the background does stuff and eventually trigers the creation of the account via the service catalog product. There is a small time delay between those two steps but the request to the control tower control plane returns imediatelly which leaves a window for a race condition. That is why that wait is important there.

nikoizs commented 2 years ago

After debugging it a bit more , I found that the timeout is caused by a bug in the get_changing_accounts method that is used to check if control tower is busy here. The get_chaning_accounts method is not paginating the results returned from the servicecatalog api, and will only check the first 100 provisioned products (here).

I will close this pull request and create a new one.

costastf commented 2 years ago

Great catch!