hackoregon / backend-examplar-2018

an example dockerized geo-aware django backend
MIT License
4 stars 5 forks source link

Document and assess last year's Travis and Deploy strategies. #51

Open bhgrant8 opened 6 years ago

bhgrant8 commented 6 years ago

Think it maybe valuable to moving forward taking sometime and documenting what we know about how each project was integrated with travis.

Things to look at:

Somehow we got every project moved through the chain, so should be able to point to some learnings.

I plan to take some time over weekend to look into this

MikeTheCanuck commented 6 years ago

Helluva good thought Brian, I’ve been meaning to do something similar, so I’ll contribute here instead.

bhgrant8 commented 6 years ago

First Observation: The .travis.yml Files

Looking over .travis.yml files for the last year, all projects seemed to follow a basic pattern, will copy the Team Budget example here:

sudo: required
services:
  - docker
install:
  - pip install --upgrade --user awscli
before_script:
  - ./budget_proj/bin/getconfig.sh
script:
  - './budget_proj/bin/test-proj.sh -t'
after_success:
  - ./budget_proj/bin/docker-push.sh

Breaking this down:

sudo: required - allows travis commands to be run as sudo in the environment

services:
  - docker
install:
  - pip install --upgrade --user awscli

So two things here, first install is a step in Travis's build lifecycle - this step will install any dependencies we will need on an os level, basically this will be more related to aws/deploy and not anything needed within the docker container or django project. the pip install command is installing the aws-cli which will be used to deploy the built container to AWS ECR (Elastic Container Registry)

before_script:
  - ./budget_proj/bin/getconfig.sh
script:
  - './budget_proj/bin/test-proj.sh -t'

The script command is the bulk of the work of travis. It should include any project build and test tasks. If it exits as non-zero, it will be marked as a failed build but will continue to run through to the after_failure step

The test-proj.sh script builds the containers then runs the test-entrypoint.sh script which runs the tests

after_success:
  - ./budget_proj/bin/docker-push.sh

Provided the script command ends with a successful code, the after_success command will be run. The docker-push.sh file, essentially verified whether the current build was based on a PR request to the Master branch. Only in this case would it then run the ecs-deploy script which would take the successfully build and tested containers to AWS services using the awscli client that was installed.

So we seem to see 3 main tasks in this:

  1. build the API container we want to deploy
  2. confirm the built container passes any specified tests
  3. provided this is a request to the correct branch, will then deploy the built container off to aws

Observations:

Overall this seems like a fairly basic pattern to continue to use, unless we see a specific case, as in removing the getconfig complexities. There may be some opportunities to use more of the build lifecycle steps to our advantage. For example, possible alerting on after_failure? The "deploy" step is intriguing, however I believe only works if you are using a supported deploy provider, which i am not sure we fit into.

Last years examples:

znmeb commented 6 years ago

where / when does the "docker-compose" happen, if it does?

bhgrant8 commented 6 years ago

Build and Testing Scripts

docker-compose operations were run as part of the "build-test" script run in the "script" step of the travis build cycle, thus will feed into either "after_success" or "after_failure"

One example again is the "team-budget" script. (as budget situated the api within a sub-directory in the repo, many of the example scripts includes the $PROJ_SETTINGS_DIR in directory paths. this would not be used in examplar, where api is within root of repo):

https://github.com/hackoregon/team-budget/blob/master/budget_proj/bin/test-proj.sh

# Run all configured unit tests inside the Docker container
while getopts ":lt" opt; do
    case "$opt" in
        l)
          docker-compose -f $PROJ_SETTINGS_DIR/local-docker-compose.yml build
          docker-compose -f $PROJ_SETTINGS_DIR/local-docker-compose.yml run \
          --entrypoint /code/bin/test-entrypoint.sh $DOCKER_IMAGE
          ;;
        t)
          docker-compose -f $PROJ_SETTINGS_DIR/travis-docker-compose.yml build
          docker-compose -f $PROJ_SETTINGS_DIR/travis-docker-compose.yml run \
          --entrypoint /code/bin/test-entrypoint.sh $DOCKER_IMAGE
          ;;
        *)
          usage
          ;;
    esac
done

So we see two flags:

While building the images pulls different compose files we do use the same test-entrypoint.sh in each environment (removed commented out lines for clarity):

#!/bin/bash
export PATH=$PATH:~/.local/bin

python manage.py test --no-input --keepdb

i am not sure completely why we needed to update the PATH?

in terms of the script:

python manage.py test --no-input --keepdb

we see the basic manage.py test being run. Then --no-input, to prevent the script prompting user for input, allowing to be run automatically.

Most important is the --keepdb moniker, meaning that the database that the tests are run on is persistent from one test to the next. Emergency Response followed this pattern as well. Emergency Response ran all tests as read-only against the production database, still have to look at budget to see if this is same (future post)

Observations

We are using the same script to accomplish 2 tasks: building a container, then testing it. There is an entrypoint script to overide the default entrypoint that is run in the containers for the docker-compose up. We may need to use a --noinput flag to make sure the script does not stop and wait for user input. Connecting to a persistent database, is a path some projects used. When doing so, no migrations were run to prevent any changes to the db.

Other Examples

So this is one area there is some differentiation, that are worth looking into:

znmeb commented 6 years ago

Ah - so the actual work is done in shell scripts, not in travis.yml.

bhgrant8 commented 6 years ago

yeah, in our setup, i think once you get past very simple commands, doing so makes things a bit easier.

bhgrant8 commented 6 years ago

Testing Database Connections

So continuing to work through the testing setup, before we get to tests themselves being run, lets look at the datastores that teams are connecting to for testing, and how.

Emergency Response

Starting here as I know the most.

When I came into the program to start building the API, we had a fairly developed database already live on AWS. I was given read-only creds to the prod AWS database, after hacking around some options, I ended up configuring my tests to run against the production database, as it was not creating or deleting any data.

This strategy involved:

if 'test' in sys.argv or 'test_coverage' in sys.argv:
    DATABASES = {
        'default': {
            'ENGINE': project_config.AWS['ENGINE'],
            'NAME': project_config.AWS['NAME'],
            'HOST': project_config.AWS['HOST'],
            'PORT': 5432,
            'USER': project_config.AWS['USER'],
            'PASSWORD': project_config.AWS['PASSWORD'],
            'TEST': {
                    'NAME': 'fire',
                },
        }
    }

Team Budget

I tried to step through the repo and could not find any specific test database config. As such, my assumption is that the deployed database, included a test version as well, which was then persisted. Whether this is correct or not, seems would be a good pattern to not test directly on prod dbs but also use the same read-only creds. Questions, is this correct or am I missing something? How would we create, then deploy the test version of the db. Prior to s3 upload, or could this replication be part of devops process?

Team Housing

With housing using py.test, they supplied a pytest.ini which pointed to the test settings:


from .settings import *

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.sqlite3",
        "NAME": ":memory:",
    }
}

EMAIL_BACKEND = 'django.core.mail.backends.locmem.EmailBackend'

So we see using a sqllite db in the test environment. Not uncommon practice, but doesn't really test actual production database connection and services.

Team Homeless

It appears that they are using django's fixtures to provide some test data, but are not making a connection to an actual backend datastore. Similar to housing, a common pattern, but if we want to verify a functioning database connection with the db, then this does not actual accomplish this. If we are looking to test only the python code, is an acceptable option. https://github.com/hackoregon/teamHomelessness/blob/master/homelessAPI/homelessApp/tests.py#L6

Team Transportation

Guess you don't need a testing backend if you don't actually write any tests?

znmeb commented 6 years ago

We didn't have any tests for Transportation ... the best guess as to what the final app looked like was the local development environment running on an Ubuntu 16.04.x LTS laptop. ;-)

https://github.com/hackoregon/transportation-backend/tree/master/ubuntu-local-deploy

MikeTheCanuck commented 6 years ago

In Budget’s case, we knew from the start that we would never write t the database, so it never occurred to me that testing against the production database would be a risk. (Only a risk if someone commits, and someone else merges, Django code that writes to the DB, but certainly a risk the greater the distance from those tribal assumptions.)

Not sure what the best strategy is here - duplicating the databases in production is a huge waste of memory for 99% of the time, but I agree that testing against a local sqlite3 doesn’t catch one of our biggest dependencies.

In theory we could use separate creds (test creds = read-only), but if anyone plans to write to their DB then we’re hosed.

In a monied organisation we’d just have a separate test/QA infrastructure, but I am loathe to spend that kind of money on behalf of an org that just recently asked for tax-deductible individual donations.

bhgrant8 commented 6 years ago

I agree in a tradeoff btwn budget and a "pristine" qa environment our budget is the priority. Mostly wanted to make this decision explicit and documented.

MikeTheCanuck commented 6 years ago

Environment Variable usage

In 2017 API projects, the following env vars were configured in each Travis repo:

Examination of configured Travis env vars In the analysis below, nearly all findings were based on the team-budget repo. Variations between projects should be accounted for as well, but rather than wait until I had the extra hours to review those as well, I'm posting this for others to build upon.

Implicit Travis env vars

Implicit Docker env vars

Env vars unique to projects

Hard-coded environment variables

QUESTION (maybe just for myself): when passed through docker-compose.yml as env vars, are the passed-in env vars implicitly used by anything else other than the /bin/ scripts?

MikeTheCanuck commented 6 years ago

Proposal for Travis env vars

All other common env vars used in last year's Travis settings (excepting the DOCKER_USERNAME and DOCKER_PASSWORD used in transportation-backend) are still valid and useful.

MikeTheCanuck commented 6 years ago

Travis configuration for builds

There are a number of basic settings in Travis that we use, in conjunction with communicated (tribal?) expectations, to enable Hack Oregon to get consistent builds and deploys:

These settings only work because we have configured the docker-push.sh script to do the following:

# Tag, Push and Deploy only if it's not a pull request
if [ -z "$TRAVIS_PULL_REQUEST" ] || [ "$TRAVIS_PULL_REQUEST" == "false" ]; then
  # Push only if we're testing the master branch
   if [ "$TRAVIS_BRANCH" == "master" ]; then

This sets up a pattern of the following:

MikeTheCanuck commented 6 years ago

How Travis hands off to AWS

This is due to the "magic" of the docker-push.sh script e.g. team-budget:

    export PATH=$PATH:$HOME/.local/bin
    echo Getting the ECR login...
    eval $(aws ecr get-login --region $AWS_DEFAULT_REGION)
    echo Running docker push command... # Troubleshooting
    docker push "$DOCKER_REPO"/"$DEPLOY_TARGET"/"$DOCKER_IMAGE":latest
    echo Running ecs-deploy.sh script...
    ./$PROJ_SETTINGS_DIR/bin/ecs-deploy.sh  \
     -n "$ECS_SERVICE_NAME" \
     -c "$ECS_CLUSTER"   \
     -i "$DOCKER_REPO"/"$DEPLOY_TARGET"/"$DOCKER_IMAGE":latest \
     --timeout 300

There are four key actions here:

  1. export PATH...
  2. eval $(aws ecr get-login)...
  3. docker push...
  4. ecs-deploy.sh...

Breaking this down...

export PATH=$PATH:$HOME/.local/bin

IIRC, this is here to ensure that the aws CLI (installed via .travis.yml) is on the $PATH

eval $(aws ecr get-login --region $AWS_DEFAULT_REGION)
docker push "$DOCKER_REPO"/"$DEPLOY_TARGET"/"$DOCKER_IMAGE":latest

This pushes the image that was just built in the Travis environment (by build-proj.sh) up to the AWS ECS registry. IIUC, this pushes the $DOCKER_IMAGE to $DOCKER_REPO server into the $DEPLOY_TARGET repository, and apply the "latest" tag.

What mystifies me (despite great articles like this is whether there's an implicit docker tag command having been run elsewhere in our stack, to have pre-tagged the image before we push it.

./$PROJ_SETTINGS_DIR/bin/ecs-deploy.sh  \
     -n "$ECS_SERVICE_NAME" \
     -c "$ECS_CLUSTER"   \
     -i "$DOCKER_REPO"/"$DEPLOY_TARGET"/"$DOCKER_IMAGE":latest \
     --timeout 300

This final script is a third-party script that enables Travis to tell AWS to pull a copy of the $DEPLOY_TARGET/$DOCKER_IMAGE:latest from $DOCKER_REPO and deploy it to the $ECS_SERVICE_NAME in $ECS_CLUSTER.

For example, for the team-budget project from 2017, this will tell AWS to pull integration/budget-service:latest from 845828040396.dkr.ecr.us-west-2.amazonaws.com and deploy it to hacko-integration-BudgetService-16MVULLFXXIDZ-Service-1BKKDDHBU8RU4 on the hacko-integration cluster.

Travis output

When everything is successful, the Travis build log will display something like the following at the end of the log:

$ ./budget_proj/bin/docker-push.sh
Getting the ECR login...
Flag --email has been deprecated, will be removed in 1.13.
Login Succeeded
Running docker push command...
The push refers to a repository [845828040396.dkr.ecr.us-west-2.amazonaws.com/integration/budget-service]
Running ecs-deploy.sh script...
Using image name: 845828040396.dkr.ecr.us-west-2.amazonaws.com/integration/budget-service:latest
Current task definition: arn:aws:ecs:us-west-2:845828040396:task-definition/budget-service:121
New task definition: arn:aws:ecs:us-west-2:845828040396:task-definition/budget-service:122
Service updated successfully, new task definition running.
MikeTheCanuck commented 6 years ago

.travis.yml configuration

The configuration-in-common for all of last year's API projects' .travis.yml is this:

sudo: required
services:
  - docker
install:
  - pip install --upgrade --user awscli
before_script:
  - ./bin/getconfig.sh
script:
  - './bin/test-proj.sh -t'
after_success:
  - ./bin/docker-push.sh

(That is, except the emergency-response-backend, which somehow skipped the before_script step to get-config.sh)

Two of the projects went much further and embedded a bunch of extra, undocumented setup work (that hopefully we can avoid in this year's projects) in the Travis setup:

bhgrant8 commented 6 years ago

I had the getconfig embedded into the other shell scripts on emergency response

On Sun, Apr 29, 2018, 12:12 PM Mike Lonergan notifications@github.com wrote:

.travis.yml configuration

The configuration-in-common for all of last year's API projects' .travis.yml is this:

sudo: required services:

  • docker install:
  • pip install --upgrade --user awscli before_script:
  • ./bin/getconfig.sh script:
  • './bin/test-proj.sh -t' after_success:
  • ./bin/docker-push.sh

(That is, except the emergency-response-backend, which somehow skipped the before_script step to get-config.sh)

Two of the projects went much further and embedded a bunch of extra, undocumented setup work (that hopefully we can avoid in this year's projects) in the Travis setup:

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hackoregon/backend-examplar-2018/issues/51#issuecomment-385274079, or mute the thread https://github.com/notifications/unsubscribe-auth/AZWY9WoAuow5rUH35L-s9jl2nLpwjLAOks5tthCngaJpZM4TrMZO .

nam20485 commented 6 years ago

Comparing what you gave for team budget's .travis.yml to what he have currently in the exemplar, I can see three differences:

  1. The script command line arguments are slightly different but seem to be semantically similar. They use -t and -l while our scripts use -p and -d. I believe our -p- corresponds to their -t.

  2. Our repo separates what they have in test-proj.sh in to two script files, build.sh and test.sh.

  3. Our repo does not contain two of the scripts: docker-push.sh (yet), or getconfig.sh (probably never will)

To achieve the same level of travis behavior as e.g. team budget's backend, we could implement the following changes in the exemplar repo:

  1. Change the script: section to call bin/build.sh -p and bin/test.sh -p
  2. Remove reference to getconfig.sh from before_script: stanza
  3. Create and implement docker-push.sh

Leaving us with a .travis.yml that looks like:

sudo: required

services:
  - docker

install:
  - pip install --upgrade --user awscli

script:
  - ./bin/build.sh -p
  - ./bin/test.sh -p

after_success:
  - ./bin/docker-push.sh
nam20485 commented 6 years ago

Testing on the disaster-resilience-backend repo, Travis builds running with the config outlined in the post above seem to create the apiproduction docker image successfully, but one key problem is that the .env file is not there, so the `PRODUCTION` environment variables are not set, resulting in the following message:

...
$ ./bin/build.sh -p
WARNING: The PRODUCTION_POSTGRES_USER variable is not set. Defaulting to a blank string.
WARNING: The PRODUCTION_POSTGRES_NAME variable is not set. Defaulting to a blank string.
WARNING: The PRODUCTION_POSTGRES_HOST variable is not set. Defaulting to a blank string.
WARNING: The PRODUCTION_POSTGRES_PORT variable is not set. Defaulting to a blank string.
WARNING: The PRODUCTION_POSTGRES_PASSWORD variable is not set. Defaulting to a blank string.
WARNING: The PRODUCTION_DJANGO_SECRET_KEY variable is not set. Defaulting to a blank string.
Building api_production
...
MikeTheCanuck commented 6 years ago

New env vars in play this year