ubc / canvas-data-2

Infrastructure and code to maintain a Canvas Data 2 replica database in AWS.
MIT License
2 stars 0 forks source link

canvas-data-2-aws - WORK IN PROGRESS

This project contains source code and supporting files for a serverless application that you can use to download and maintain a Canvas Data 2 replica database. You can deploy this application to your AWS account with the SAM CLI. It includes the following files and folders.

This application uses an AWS Step Function to orchestrate the workflow:

workflow diagram

Application workflow

  1. The Step Function is executed on an hourly schedule via EventBridge.
  2. The first step executes the list_tables Lambda functions which retrieves the list of CD2 tables from the API.
  3. The list of tables is passed to a Map step which executes the following steps for each item in the list:
    1. The sync_table Lambda function is executed. This returns either success or init_needed (if the table doesn't exist in the database yet).
    2. The output of sync_table is checked: if the table successfully synced, the iteration is complete. If init_needed was returned, the init_table function is executed.
    3. If executed, the output of init_table is checked; error handling TBD
  4. Once all iterations are complete, a notification is sent to an SNS topic

Prerequisites

It will be helpful to have a working knowledge of AWS services and the AWS Console. Before you can deploy the application you will need to have the following available:

By default the database will not have a public IP address and will not be accessible outside of your VPC. You will need to configure network access to the database as appropriate for your situation.

Deploying the application

The Serverless Application Model Command Line Interface (SAM CLI) is an extension of the AWS CLI that adds functionality for building and testing Lambda applications. It uses Docker to run your functions in an Amazon Linux environment that matches Lambda. It can also emulate your application's build environment and API.

To use the SAM CLI to deploy this application, you need the following tools.

To build and deploy your application for the first time, run the following in your shell:

sam build
sam deploy --guided

The first command will build the source of your application. The second command will package and deploy your application to AWS, with a series of prompts:

Initial Database Setup

Deploying this application will create:

In order for the application to use that credential to connect to the database, a database user must be created and granted appropriate privileges. A helper script is included that will take care of this setup.

After deploying the SAM app, run this script. You must have valid AWS credentials before running the script.

pip install setup/requirements.txt -r
./setup/prepare_aurora_db.py --stack-name <stack name returned by the SAM deployment>

Occasionally the schema for a CD2 table will change. The DAP library will take care of applying these changes to the database, but they will not succeed if you have created views that depend on the table. To handle this situation, the sync_table Lambda function will attempt to drop and recreate any views that depend on the table being synced. The pgsql functions necessary to do this can be found in this repository: https://github.com/rvkulikov/pg-deps-management. You will need to run the ddl.sql script in your database to create the necessary functions. (details tbd)

Configuration

In order for the application to use the DAP API, you will need to provide a client ID and secret.

The application uses AWS SSM Param Store to securely these values and retrieve them at runtime. To store your client ID and secret:

aws ssm put-parameter --name '/<environment>/canvas_data_2/dap_client_id' --type SecureString --value '<your client ID>'
aws ssm put-parameter --name '/<environment>/canvas_data_2/dap_client_secret' --type SecureString --value '<your client secret>'

where <environment> is either dev or prod. You can also use the AWS SSM console to manage the parameter.

Running the application

By default the workflow that synchronizes the database will run ever three hours. You can also run the workflow manually via the AWS Console: navigate to the Step Functions console, find your CD2RefreshStateMachine in the list, and click the Start execution button.

This application uses AWS Lambda to run the init and sync steps for each CD2 table. If the init or sync step for any given table takes longer than 15 minutes (the limit on how long Lambda functions can run), the workflow will fail. You will be able to see the error in the AWS Step Functions console. If this happens, you'll need to perform the first initialization for the problematic table manually using the DAP client.

TODO: details on how to initialize a table using the DAP client

Cleanup

To delete the application that you created, use the AWS CLI. Assuming you used your project name for the stack name, you can run the following:

aws cloudformation delete-stack --stack-name canvas-data-2

Alternatively, you can delete the stack in the CloudFormation console (within the AWS web console).