A set of AWS services for downloading and ingesting Zoom meeting videos into Opencast
The Zoom Ingester (a.k.a., "Zoom Ingester Pipeline", a.k.a., "ZIP") is Harvard DCE's mechanism for moving Zoom recordings out of the Zoom service and into our own video management and delivery system, Opencast. It allows DCE to deliver recorded Zoom class meetings and lectures alongside our other, non-Zoom video content.
When deployed, the pipeline will have an API endpoint that must be registered in your Zoom account as a receiver of completed recording events. When Zoom has completed the processing of a recorded meeting video it will send a "webhook" notification to the pipeline's endpoint. From there the recording metadata will be passed along through a series of queues and Lambda functions before finally being ingested into Opencast. Along the way, the actual recording files will be fetched from Zoom and stored in S3. Alternatively, from the Opencast admin interface, a user can kick off an "on-demand" ingestion by entering the identifier of a Zoom recording and the corresponding Opencast series into which it should be ingested. The On-Demand ingest function then fetches the recording metadata from the Zoom API and emulates a standard webhook.
Info on Zoom's API and webhook functionality can be found at:
virtualenv
packageNote, as of v4 you no longer need the aws-cdk toolkit installed as a pre-requisite. We will install that as part of the setup detailed below.
aws configure
at some point so that you
at least have one set of credentials in an ~/.aws/configure
file.virtualenv venv && source venv/bin/activate
pip-tools
so you need to install that first: pip install pip-tools
pip-sync requirements/dev.txt
.npm install
in the project rootexample.env
to .env
and update as necessary. See inline comments for an explanation of each setting.AWS_PROFILE
in your .env
file. Otherwise
you'll need to remember to set in your shell session prior to any invoke
commands.invoke test
to confirm the installation.invoke -l
to see a list of all available tasks + descriptions.pip install pre-commit && pre-commit install
.LAMBDA_CODE_BUCKET
in .env
.invoke stack.create
to build the CloudFormation stack.invoke schedule.import-csv [filepath]
.That's it. Your Zoom Ingester is deployed and operational. To see a summary of the
state of the CloudFormation stack and the Lambda functions run invoke stack.status
.
GSHEETS_DOC_ID
and GSHEETS_SHEET_NAMES
Since these credentials are shared within an AWS account, the following setup only needs to be done once per AWS account:
service_account.json
credentials file.invoke schedule.save-creds [-f credentials-filename]
The ZIP API endpoint is available in the output of invoke stack.create
,
invoke stack.update
and invoke stack.status
under Outputs
.
function onOpen() {
var ui = SpreadsheetApp.getUi();
ui.createMenu('ZIP')
.addItem('Update ZIP Schedule', 'updateZoomIngester')
.addToUi();
}
function updateZoomIngester() {
var url = "[your stack endpoint]";
var options = {
'method' : 'POST',
};
var response = UrlFetchApp.fetch(url, options);
Logger.log(response);
SpreadsheetApp
.getActiveSpreadsheet()
.toast(response.getContentText(), "Schedule Updated", 3);
}
Go to https://api.slack.com/apps, log in if necessary, go to "Build" then "Your Apps", and click "Create New App".
Select "From scratch", give the app a name, and pick the Slack workspace to develop your app in. Then click "Create App" to create the app.
You should now be on the "Settings" > "Basic Information" page for your app. Open "Add features and functionality".
Click on "Interactive Components", then toggle "Interactivity" on. Paste the slack endpoint url into the "Request URL" filed. (This can be found in the CDK stack outputs and should end with /slack
.) Save changes.
Go to "Slash Commands" for the app and click "Create New Command". Enter /zip
for the Command or an alternative of your choice. Paste the slack endpoint into the "Request URL" field and enter a short description and then save the new command.
Go to "Permissions". Scroll down to "Scopes". Add the OAuth Scope "usergroups:read". (The "commands" scope should already be there.)
Install the app in the workspace.
Add the following environment variables to your .env
file:
SLACK_API_TOKEN
.SLACK_SIGNING_SECRET
.SLACK_ZIP_CHANNEL
to the name of the Slack channel in which you would like to allow usage of the Slack integration.SLACK_ALLOWED_GROUPS
to a comma delimited list of Slack groups whose members will be allowed to use the integration.Run invoke stack.update
and invoke deploy.slack --do-release
to release new values of the environment variables.
Once the Zoom Ingester pipeline is deployed you can configure your Zoom account to send completed recording and other event notifications to it via the Zoom Webhook settings.
invoke stack.status
output
or by browsing to the release stage of your API Gateway REST api..env
file as WEBHOOK_VALIDATION_SECRET_TOKEN
.invoke stack.update
to deploy the new WEBHOOK_VALIDATION_SECRET_TOKEN
value to the webhook lambda function. This is necessary for
app endpoint validation step coming up.Click "+ Add Events" and subscritbe to the following events:
For automatic ingests:.
For status updates:
The easiest way to find a listing of the endpoints for your stack is to run invoke stack.status
and look in the Outputs. Identify the endpoints by their ExportName
.
Description: Receives webhook notifications from Zoom for ingests or status updates and receives on-demand ingest requests forwarded from the on-demand endpoint.
Endpoint: POST /new_recording
ExportName : <stack-name>-webhook-url
Accepted Zoom webhook notifications for status updates:
Accepted Zoom webhook notifications for ingests:
Description: Initiate a new on demand ingests from Opencast.
Endpoint: POST /ingest
ExportName:<stack-name>-ingest-url
Request Body Schema
Parameter | Required? | Type | Description |
---|---|---|---|
uuid | Yes | string | Either the recording uuid or the link to the recording files. Recording files link example: https://zoom.us/recording/management/detail?meeting_id=ajXp112QmuoKj4854875%3D%3D |
oc_series_id | No | string | Opencast series id to ingest this recording to. Default: Recording only ingested if it matches the existing ZIP schedule. |
oc_workflow | No | string | Override the default Opencast workflow. |
allow_multiple_ingests | No | boolean | Whether to allow the same Zoom recording to be ingested multiple times into Opencast. Default false. |
ingest_all_mp4 | No | boolean | Whether to ingest (and archive) all mp4 files. Default false. |
Request Body Example
{
"uuid": "ajXp112QmuoKj4854875==",
"oc_series_id": "20210299999",
"oc_workflow": "workflow_name",
"allow_multiple_ingests": false,
"ingest_all_mp4": false
}
Description: Update the ZIP schedule.
Endpoint: POST /schedule_update
ExportName: <stack-name>-schedule-url
Parameters: No parameters. Retrieves schedule from stack associated google sheet.
Description: Check the status of a recording.
Endpoint: GET /status
ExportName: <stack-name>-status-url
Request Path Parameters
Provide only one of the following:
meeting_id
- A Zoom meeting id.
seconds
- Retrieve status' updated within the last X seconds.
Request Examples
Retrieve all status' updated within the last 10 seconds:
GET https://<your-stack-endpoint-url>/status?seconds=10
Retrieve current status of all recordings with Zoom meeting id 86168921331:
GET https://<your-stack-endpoint-url>/status?meeting_id=86168921331
.env
STACK_NAME
to a unique value.invoke deploy.all --do-release
to push changes to your Lambda functions.
Alternatively, to save time, if you are only editing one function, run invoke deploy.[function name] --do-release
../cdk
or update environment variables you must also (or instead) run
invoke stack.diff
to inspect the changesinvoke stack.update
to apply the changesinvoke exec.webhook [options]
to initiate the pipeline. See below for options.invoke exec.webhook [uuid]
Options: --oc-series-id=XX
This task will recreate the webhook notification for the recording identified by
uuid
and manually invoke the /new_recording
api endpoint.
invoke exec.pipeline [uuid]
Options: --oc-series-id=XX
Similar to exec.webhook
except that this also triggers the downloader and
uploader functions to run and reports success or error for each.
invoke exec.on_demand [uuid]
Options: --oc-series-id=XX --oc_workflow=XX --allow-multiple-ingests --ingest-all-mp4
This task will manually invoke the /ingest
endpoint. This is the endpoint used by the Opencast "+Zoom" tool.
--oc-series-id=XX
- Specify an opencast series id.
--oc_workflow=XX
- Select an Opencast workflow other than the default.
--allow-multiple-ingests
- Allow multiple ingests of the same recordiing (for testing purposes).
--ingest-all-mp4
- Ingest (for archival purposes) all mp4 files associated with the requested recording
Incoming Zoom recordings are ingested to an Opencast series based on two pieces of information:
The Zoom Ingester pipeline includes a DynamoDB table that stores information about when Zoom classes are held. This is because the same Zoom series id can be used by different courses. To determine the correct Opencast series that the recording should be ingested to we need to also know what time the meeting occurred.
The current authority for Zoom meeting schedules is a google spreadsheet. To populate
our DynamoDB from the spread sheet data we have to export the spreadsheet to CSV and then
import to DynamoDB using the invoke schedule.import-csv [filepath]
task.
If a lookup to the DynamoDB schedule data does not find a mapping the uploader function will
log a message to that effect and return. During testing/development, this can be overridden
by setting the DEFAULT_SERIES_ID
in the lambda function's environment. Just set that
to whatever test series you want to use and all unmapped meetings will be ingested to that series.
This project uses the invoke
python library to provide a simple task cli. Run invoke -l
to see a list of available commands. The descriptions below are listed in the likely order
you would run them and/or their importance.
invoke stack.create
Does the following:
Notes:
stack.update
as well) you will be presented with a
confirmation prompt to approve some of provisioning operations or changes, typical those realted
to security and/or permissionsUse stack.update
to modify an existing stack.
invoke stack.status
This will output some tables of information about the current state of the CloudFormation stack and the Lambda functions.
invoke codebuild --revision=[tag or branch]
Execute the CodeBuild project. This is the command that should be used to deploy and release new versions of the pipeline functions in a production environment.
--revision
is a required argument.
The build steps that CodeBuild will perform are defined in buildspec.yml
.
invoke stack.diff
View a diff of CloudFormation changes to the stack.
invoke stack.update
Apply changes to the CloudFormation stack.
invoke stack.changeset
Like stack.update
except changes are captured in a CloudFormation changeset and execution of the update is deferred and must be done manually, most likely through the CloudFormation web console. Use this instead of stack.update
if you want to be more cautious with the deployment. There are times when every change an update is going to make is not represented in the diff output of stack.diff
. A changeset allows you to inspect what's going to change in more detail. A changeset can also be discarded if it contains changes that are unwanted or incorrect for some reason.
invoke stack.delete
Delete the stack.
invoke debug.{on,off}
Enable/disable debug logging in the Lambda functions. This task adds or modifies
a DEBUG
environment variable in the Lambda function(s) settings.
invoke update-requirements
Does a bulk pip-compile
upgrade of all base and function requirements.
Dependencies for the project as a whole and the individual functions are managed using
the pip-tools
command, pip-compile
. Top-level dependencies are listed in a .in
file
which is then compiled to a "locked" .txt
version.
There are four different contexts that require dependencies to be pip-compiled:
invoke
tasks for packaging, deployment and stack updatesThere is unfortunately not a clean separation between what's required in each of these contexts. For instance, some of the base and dev context requirements require packages that also used in the functions. It would be great to reconcile this at some point, but in the meantime just be sure when updating or adding a package to run pip-compile on affected requirements files in this order:
function_requirements/common-requirements.in
requirements/base.in
requirements/tox.in
requirements/dev.in
Running pip-compile on a .in
file will generate a corresponding .txt
file which "locks" the dependent package versions.
Both the .in
and .txt
files should be committed to version control.
The main situation in which this becomes necessary is when you need to update a particular package due to vulnerability. For example, if the google-auth package needed to be updated you would run:
pip-compile -P google-auth function_requirements/common-requirements.in
Afterwards you would need to also pip-compile
the remaning three "downstream" requirements files (in order) since they
use the -r
flag to import the common-requirements.txt
file.
Finally, you'll want to run pip-sync requirements/dev.txt
to ensure the packages are updated in your virtualenv.
Important: when updating the versions of aws-cdk-lib
and constructs
you must also update the version of aws-cdk
specified in the package.json
.
The lambda python functions each have associated unittests. To run them manually execute:
invoke test
Alternatively you can just run tox
directly.
Lambda functions employ the concepts of "versions" and "aliases". Each time you push new
code to a Lambda function it updates a special version signifier, $LATEST
. If you wish
to assign an actual version number to what is referenced by $LATEST
you "publish" a
new version. Versions are immutable and version numbers always increment by 1.
Aliases allow us to control which versions of the Lambda functions are invoked by the system.
The Zoom Ingester uses a single alias defined by the .env
variable, LAMBDA_RELEASE_ALIAS
(default "live"),
to designate the currently released versions of each function. A new version of each
function can be published independent of the alias as many times as you like. It is only
when the release alias is updated to point to one of the new versions that the behavior
of the ingester will change.
When you first build the stack using the above "Initial Setup" steps, the version of the Lambda functions code will be whatever was current in your cloned repo. The Lambda function versions will be set to "1" and the release aliases ("live") will be pointing to this same version. At this point you may wish to re-release a specific tag or branch of the function code. In a production environment this should be done via the CodeBuild project, like so:
invoke codebuild -r release-v1.0.0
This command will trigger CodeBuild to package and release the function code from the github repo identified by the "release-v1.0.0" tag. Each function will have a new Lambda version "2" published and the release alias will be updated to point to this version.
Merge all changes for release into master. Checkout master branch, git pull.
First check that the codebuild runs with the new changes on a dev stack:
If there are new functions you must package and ensure the code is in s3:
invoke package -u
then
invoke stack.update
invoke codebuild --revision=master
Make sure codebuild completes successfully.
First update your tags:
git tag -l | xargs git tag -d
git fetch --tags
Then tag release:
git tag release-vX.X.X
git push --tags
Make sure you are working on the correct zoom ingester stack, double check environment variables. Then:
If there are new functions you must package and ensure the code is in s3:
invoke package -u
then
invoke stack.update
invoke codebuild --revision=release-vX.X.X