Closed MikeTheCanuck closed 4 years ago
This commit has had the intended effect: loads the current 2018ND container image as a Fargate-hosted service and is answering requests from the gunicorn server in front of Django.
However, we have a small problem:
So while the Fargate conversion works great, we end up with an API that isn't actually functional. That means there's some problems with the API (Django app) itself, not with the Fargate configuration, but it is a problem to be solved.
Unfortunately converting back to the EC2-based service deployment does nothing to further the API's health - the swagger schema renders fine, but the API endpoints all 404 - so at this point (even if the 2018ND API was working recently) the 2018ND API is down and need some in-house repair to get back to functional. The CloudFormation configuration is just as "working" in Fargate as it is in EC2-land, so this change stays.
Scratch that - the base API routes such as /api throw a 404, but the actual configured endpoints such as http://service.civicpdx.org/neighborhood-development/api/affordable_housing are working just fine. Stand down, alarms off, back to our usual programming.
The trickiest part of performing the switch from EC2 to Fargate is a problem of resource collisions such as Task Roles and Listener Priorities.
In the master.yaml each Resource
is given a unique name, and that name is used as a unique variable input for a variety of AWS objects, including the Task Role that we generate to ensure the Service's Task(s) have sufficient access to the AWS resources they need (e.g. SSM parameters).
When migrating an existing service from EC2 to Fargate, the natural temptation is to just copy/paste the existing resource block, comment out the old one, and update or add the Parameters needed for the new Fargate template. Unfortunately, by re-using the same name (e.g. 2018DR
), CloudFormation will fail the stack update and rollback, reporting e.g. Embedded stack arn:aws:cloudformation:us-west-2:845828040396:stack/hacko-integration-2018DR-QYKK83G1ODA0/696db100-6104-11e8-ac84-50a68d01a68d was not successfully updated. Currently in UPDATE_ROLLBACK_IN_PROGRESS with reason: The following resource(s) failed to create: [TaskRole].
And in the embedded stack for the failed service, you'll see e.g. ecs-service-hacko-integration-2018DR-QYKK83G1ODA0 already exists in stack arn:aws:cloudformation:us-west-2:845828040396:stack/hacko-integration-2018DR-QYKK83G1ODA0/696db100-6104-11e8-ac84-50a68d01a68d
So when creating the new Fargate-based Resource for the existing service, I temporarily gave it a different name e.g. 2018DiRe
. Then after all the rest of the work was done (see Problem 2 and Resolution 2 below), I added a final commit to the PR to rename the resource back to its original name.
that when deleting the EC2 service at the same time as adding the related Fargate service, ECS often tries to add the ALB listeners for the new service before the old service's listeners have been removed. This creates a collision between two services trying to use the same Priority values (which must be unique within any ALB-based cluster - see current assignments here), such that the stack update fails a rolls back.
When digging into the details of the failed stack update, you'll see:
Embedded stack arn:aws:cloudformation:us-west-2:845828040396:stack/hacko-integration-2018ND-1EOS87JFEM0C/ca130150-bba2-11e9-aa5a-0650fec6e554 was not successfully created: The following resource(s) failed to create: [ListenerRule, TaskRole, ListenerRuleTls].
Priority '84' is currently in use (Service: AmazonElasticLoadBalancingV2; Status Code: 400; Error Code: PriorityInUse; Request ID: ceffd7a6-bba2-11e9-ba8f-17534ef226a3)
The solution I found is to perform at least two updates to the stack:
Note: something we discovered and documented via #268 is that each of the 2018 API containers being migrated to Fargate also need to have the ecs-deploy.sh
script updated to a more recent version as well.
This is now PR'd to the 2018 Neighborhood Development repo as https://github.com/hackoregon/neighborhoods-2018/pull/111
We have a solid CD pattern for 2019 APIs, and we've successfully converted two 2017 APIs. Others can and should feel comfortable migrating the rest of those containers.
Now let's see what it takes to convert a 2018 container service to Fargate. I'm picking on Neighborhood Development semi-randomly because: (a) that project is completely inactive (b) none of its developers are currently part of Hack Oregon AFAIK (c) thus an outage of the API is unlikely to get in anyone's way while we complete the migration.
Addresses #244 for the 2018 Neighborhood Development API.