Closed MikeTheCanuck closed 4 years ago
I attempted to start out with the bog-standard 2019-fargate-api.yaml
template that all other current API containers use to deploy to ECS.
However, we ran into a neat little catch-22:
Path: /*
for routing - because that would then overtake the sub-routing we're doing for each of the API services (e.g. Budget has Path: /budget/*
Path: /
either, because it's a well-structured SPA with a separate directory for non-HTML named __assets
, which requires that we also implement a separate Listener that routes __assets/*
to the same container (or rather, a pair of Listeners for http:// and https:// traffic)So I attempted to refactor the standard template into a new one that implemented the additional Listeners and Path, using the lessons of the existing service.yaml
that we use currently to configure the Endpoints service on EC2.
Got it all working, everything looks like it'll work, but in the end CloudFormation isn't able to consider the deployment successful because the containers continually report "Task has exited" prematurely, with an error CannotPullContainerError: Error response from daemon: Get https://845828040396.dkr.ecr.us-west-2.amazsonaws.com/v2/: Unable to connect
.
This error is new to us, so a little google-fu and we run across many explanations similar to this one: https://github.com/aws/amazon-ecs-agent/issues/1128
Which basically tells us that for whatever reason, the Task is not able to reach ECR over the Internet - i.e. there's a routing or SecurityGroup limitation so that all ECR download requests go answered.
Further:
It turns out this is a test. Did you see the error above? The answer is up there in black and white, and you'll know it when you see it.
....spoilers ahead....
Did you see it?
CannotPullContainerError: Error response from daemon: Get https://845828040396.dkr.ecr.us-west-2.amazsonaws.com/v2/: Unable to connect
No? Well then you're no worse than me at this.
Look closer:
amazsonaws.com
Yep, somehow I slipped another character in, and the only way I noticed it was by pasting, then Cmd-Z'ing, and back and forth, until I noticed the reason why the EcrImage
parameter was off by one character in length.
Once this was conquered, I then attempted to deploy a flattened version of the SPA, where all files were housed at the root route, so that rather than requesting service.civicpdx.org/__assets/index.css
, we could instead request service.civicpdx.org/index.css
- and thus be able to skip the additional unique configuration of a second set of Listeners for /__assets
and the associated Path.
But alas, this is not to be. When we route only /
to the container rather than /*
, that literally does mean we only route service.civicpdx.org/
- any request for service.civicpdx.org/index.css
(not to mention even service.civicpdx.org/index.html
- which is the only resource that nginx is magically passing back to the requestor) gets "blocked" by ALB with a 503 response.
As I've said before, if we were to try to capture all requests for just the files at the root of the /
route, there's no easy way to do that. (It even occurs to me that we could rename all the assets to a variant of /index*
e.g. index.html, index.css, index.svg, and then setup a Path route for /index*
to be sent to this container - but even that is just as unnecessarily brittle as the solution we have that works).
So I'm back to suffering with a one-off template (fargate-endpoints-catalog.yaml) that will likely only ever be used for one resource in all our assets on this HackOregon "stack". And while that works fine, it's definitely going to be a stumbling block for a future maintainer of this project (even future me is likely to get tripped up again).
This should be a trivial operation - it's a single-page nginx app after all - but of course there are complications.
Related to #244 but will require special work one way or another.