hackoregon / 2019-backend-cookiecutter-django

Cookiecutter template for creating a backend image
MIT License
3 stars 1 forks source link

ECS docker image deploy fails #37

Open nam20485 opened 5 years ago

nam20485 commented 5 years ago

Description of problem

Deployment of docker image to ECS infrastructure fails, producing the following output:

bin/ecs-deploy.sh: line 144: [: ==: unary operator expected
bin/ecs-deploy.sh: line 148: [: !=: unary operator expected
bin/ecs-deploy.sh: line 152: [: !=: unary operator expected
Using image name: 845828040396.dkr.ecr.us-west-2.amazonaws.com/staging/2019-sandbox:latest
bin/ecs-deploy.sh: line 275: [: !=: unary operator expected
bin/ecs-deploy.sh: line 293: TASK_DEFINITION_ARN: unbound variable
Script failed with status 1
failed to deploy

This is the result of the execution of following line:

https://github.com/hackoregon/deploy-scripts/blob/5165d85c8a80a1a67d666f9255bc1a56db311896/bin/ecs-deploy.sh#L144

From the log of the deployment here: https://travis-ci.org/hackoregon/2019-sandbox-backend/jobs/554900654

Steps to Reproduce the Problem

  1. Clone the following existing repo: https://github.com/hackoregon/2019-sandbox-backend/tree/travis
  2. Run the following command to fetch the deployment scripts:
    $ bin/fetch-scripts
  3. Run the following command to start a build:
    $ git tag 1.x && git push --tags

    Where x is the next version of release that is listed here: https://github.com/hackoregon/2019-sandbox-backend/releases. e.g. If the latest release listed is 1.52, use x = 1.53

Expected Behavior

ECS deploy and travis build succeeds

Actual Behavior

ECS deploy and travis build fails with the above-listed output

Logs/Screenshots

See above

Related Code

See above

Any idea of problem? What to do to fix?

Probably has to do with the SSM parameters needing to be set.

nam20485 commented 5 years ago

@MikeTheCanuck @DingoEatingFuzz @BrianHGrant

MikeTheCanuck commented 5 years ago

Looks to me like a problem in the script’s code - “unary operator” - and might be the same problem as here:

https://github.com/silinternational/ecs-deploy/pull/58

DingoEatingFuzz commented 5 years ago

The unary operator lines may be red herrings. Looks like it's failing in the getTaskDefinition function.

The first thing that function is trying to do is look up the ECS service definition and task definition. Neither of which exist yet.

Even though the ECR repo was created, the ECS service and task were not, and all three need to be in place before we can expect CI/CD to work.

The way we handled this last year was to "bootstrap" services by deploying them to prod with an expected task count of 0. This way the metadata was in place, but no resources were scheduled.

MikeTheCanuck commented 5 years ago

Alright, that’ll be my next task, hopefully tomorrow depending on how the yard project goes.

DingoEatingFuzz commented 5 years ago

I think it's also worth taking a stab at updating the deploy script to be resilient in this scenario. It could check if the service/task exists and if it doesn't, still push the new images to ECR and warn about there being no service to restart. Something like...

$ ./deploy.sh
...
Tagging image latest
Pushing to ECR...

Restarting service in ECS...
WARNING: Service <service_name> not found. Exiting early

This might make it easier to get things up and running incrementally.

nam20485 commented 5 years ago

I think it's also worth taking a stab at updating the deploy script to be resilient in this scenario. It could check if the service/task exists and if it doesn't, still push the new images to ECR and warn about there being no service to restart. Something like...

$ ./deploy.sh ... Tagging image latest Pushing to ECR...

Restarting service in ECS... WARNING: Service not found. Exiting early This might make it easier to get things up and running incrementally.

OK I can make these changes.