hackoregon / civic-devops

Master collection point for issues, procedures, and code to manage the HackOregon Civic platform
MIT License
11 stars 4 forks source link

Update all 2018 API projects to use current deploy.sh #93

Closed MikeTheCanuck closed 6 years ago

MikeTheCanuck commented 6 years ago

The exemplar's deploy.sh has been enhanced with a couple of additions since last year's projects:

  1. adding the --no-include-email flag to aws ecr get-login to re-enable cred harvesting
  2. expanding the short-form cmd-line params for ecs-deploy.sh for readability

Now we need to ensure that all of the 2018 API repos are in sync with this:

(I have this weird feeling that @nam20485 has recommended some additional enhancements somewhere, but I can't find it so hopefully this summons will get him to confirm or dismiss such allegations.)

MikeTheCanuck commented 6 years ago

Oh there it is - he's gone and added a bunch of new env vars: https://github.com/hackoregon/disaster-resilience-backend/blob/development/bin/deploy.sh

nam20485 commented 6 years ago

If you are going to roll out a new deploy.sh I would use the example from the latest PR (https://github.com/hackoregon/backend-examplar-2018/pull/84) I created yesterday against the backend_examplar-2018. It is the most up-to-date version of the script as far as functionality and working correctly, as well as the PR being the most succinct description of the required changes.

The https://github.com/hackoregon/disaster-resilience-backend repo you linked above has the pattern implemented, but the backend-examplar-2018 PR is a better example to use as an example for porting.

I would have to go and check but I believe the backend_examplar-2018 PR version is the only version that docker pushes the container up with the correct namespace and name (i.e. "production/disaster-resilience-service").

I guess if it were up to me I would say that this issue is blocked on approval and merge of the PR.

MikeTheCanuck commented 6 years ago

I’d love to approve the PR once I understand it and can figure out all the changes it’ll require to downstream repos and figure out what the final set of env bars will be so we can make sure we aren’t flooding the system with overlapping or unnecessarily complex configuration.

The env var spread has become incoherent and the pace of change hasn’t slowed yet, so it’s getting hard to figure out what all we have to support, where and what purposes they all serve. Rushing to make more changes before we even have a clear picture of what we’ve already got is making this harder to mentally model. I could use some additional help in stabilising our CI/CD across the projects, and env vars feel like they’re at the nexus of the complexity.

I’m trying to appreciate all the contributions but right now what I’m experiencing is too much change that is directly competing with our ability to stabilise the deployment pipeline and get all the projects to a consistent config. I’m afraid of this heading towards wholly unsupportable and becoming a nightmare for anyone walking in who doesn’t have a weekend to read all our various posts, Slacks and PRs before they can make a change that doesn’t break more than it helps.

nam20485 commented 6 years ago

Mike I think you are concerned about adding to the number of env vars, but this change doesn't add any new ones- it only replaces existing ones or introduces variables local to the shell script. The changes in the PR are to change the DOCKER_IMAGE, DOCKER_SERVICE, and DEPLOY_TARGET to more descriptive and easily-understood names.

That is one change. The second is that we now tag images with a traceable travis build number, thereby adding greater visibility and traceability to the deployed docker images. The final change, getting the $DOCKER_SERVICE and $DOCKER_IMAGE variables working correctly, is a solution to the problem you were having when you introduced separating those two variables. It allows the docker image and service name to be specified, which allows us to name the service e.g. "disaster-resilience-service" instead of each repo naming their service "api_production". This is what enables us to docker push the image to the ECR at the url you specified that you wanted us to push to (i.e. /production/disaster-resilience-service).

I am not sure which deploy.sh you would use for a baseline instead of this one, but I am not sure that it would include successful docker image deployment as you have communicated to me you need it to work.

I certainly don't want to make any more work for you or introduce any additional complexity into the codebase or infrastructure, the only reason I was pushing for this to use as a candidate for rolling out deploy.sh changes was that I am fairly certain that we will end up adding it or something very similar soon afterwards to get the deployment working how you want it. Although there is a certainly a case that can be made for not adding it in order to manage the amount of changes going in right now.

Maybe a better question is how can I help or what can I do to to reduce your anxiety about the number of environment variables we are using? I have a fairly good model and knowledge in my head of all of them, at least from a docker image and its components standpoint. I also added some comments to your google doc about where the various variables originate from, where they are needed, which can be removed from your list, and what the smallest functioning set of required variables for the docker images to run is.

You mentioned you could use help stabilizing the CI/CD deployment across the backend repos, is there anything that I could tackle?

MikeTheCanuck commented 6 years ago

Well based on the great work @nam20485 put in on this script, and the flurry of finalization activity on the weekend, we have this deployed everywhere it counts it seems. Closing as complete.