course-inventory

[TOC]

Overview

The course-inventory application is designed to gather current-term Canvas LMS data about courses, enrollments, users, and course activity -- as well as data about the usage of other technologies, including Zoom and MiVideo -- in order to inform leadership at the University of Michigan about the usage of tools for teaching and learning. Currently, the application collects data from various APIs and data services managed by Unizin Consortium. It then then stores the data in an external MySQL database. Tableau dashboards and other processes then consume that data to generate reports and visualizations.

Development

Pre-requisities

The sections below provide instructions for configuring, installing, using, and changing the application. Depending on the environment you plan to run the application in, you may also need to install some or all of the following:

While performing any of the actions described below, use a terminal, text editor, or file utility as necessary. Some sample command-line instructions are provided for some steps.

Configuration

To configure the application before installation and usage (see the next section), you must first perform a few steps. This includes the creation of a configuration file called env.hjson using the HJSON format -- a more lenient and customizable variant of JSON. Complete the following items in order.

Clone and navigate into the repository.

git clone https://github.com/tl-its-umich-edu/course-inventory.git  # HTTPS
git clone git@github.com:tl-its-umich-edu/course-inventory.git      # SSH

cd course-inventory

Set up a MySQL database.

If you plan to run the application using virtualenv, you will need to have MySQL installed on your machine. You will also need to create a test database and user.

If you use Docker, instead you will use the database credentials specified in the docker-compose.yaml. This is in the environment block (ignoring MYSQL_ROOT_PASSWORD) for the mysql service.

Whether you use virtualenv or Docker, provide the database credentials within the INVENTORY_DB object. This is described more in step 4.
Copy the template configuration file, env_blank.hjson from the config directory, re-name it env.hjson, and place it inside the secrets subdirectory.
```
mv config/env_blank.hjson config/secrets/env.hjson
```

Change the default values inside env.hjson (empty strings, 0s, and provided values) with the desired values, ensuring they are the same data type. The table below describes the meaning and expected values of each key-value pair. If the value of the outermost key is an object, the description may refer instead to the nested key column. The application will also validate the configuration file you create using JSON Schema, so look for error messages when first running the application.

Key	Nested Key	Description
`LOG_LEVEL`		The minimum level for log messages that will appear in output. `INFO` or `DEBUG` is recommended for most use cases; see Python's logging module.
`JOB_NAMES`		The names of one or more jobs (not case sensitive) that have been implemented and defined in `run_jobs.py` (see the Implementing a New Job section below).
`CREATE_CSVS`		A Boolean value (`true` or `false`) indicating whether CSVs should be generated by the execution.
`MAX_REQ_ATTEMPTS`		The number of times a specific request will be attempted.
`NUM_ASYNC_WORKERS`		Number of workers for asynchronous API calls; the default is 8.
`CANVAS`	`CANVAS_ACCOUNT_ID`	The Canvas instance root account ID number associated with the courses for which data will be collected.
`CANVAS`	`CANVAS_TERM_IDS`	The Canvas instance term ID numbers that will be used to limit queries for Canvas courses.
`CANVAS`	`ADD_COURSE_IDS`	Additional Canvas course IDs to retrieve when using `online_meetings/canvas_zoom_meetings.py`. Duplicate courses found also using `CANVAS_TERM_IDS` will be removed.
`CANVAS`	`API_BASE_URL`	The base URL for making requests using the U-M API Directory; the default value should be correct.
`CANVAS`	`API_SCOPE_PREFIX`	The scope prefix that will be added after the `API_BASE_URL`; this is usually an acronym for the university location and the API Directory subscription name in CamelCase, separated by `/`.
`CANVAS`	`API_SUBSCRIPTION_NAME`	The name of the API Directory subscription all in lowercase.
`CANVAS`	`API_CLIENT_ID`	The client ID for authenticating to the API Directory.
`CANVAS`	`API_CLIENT_SECRET`	The client secret for authenticating to the API Directory.
`CANVAS`	`CANVAS_URL`	The Canvas instance URL to be used as the base URL for API requests that use the `CANVAS TOKEN`.
`CANVAS`	`CANVAS_TOKEN`	The Canvas token used for authenticating to the API when not using the U-M API Directory.
`MIVIDEO`	`udp_service_account_json_filename`	The name of the JSON credential file for accessing UDP's Google BigQuery service account. It should be the `umich-its-tl-reports-prod.json` credential file for UMich ITS TL. This file name is appended to the value of `ENV_DIR` (which is `/config/secrets`, by default) to determine the full path to the file. If this key's value is set to `umich-its-tl-reports-prod.json` and `ENV_DIR` has its default value, the full path to the file will be `/config/secrets/umich-its-tl-reports-prod.json`.
`MIVIDEO`	`default_last_timestamp`	The MiVideo procedures use the last timestamp found in its tables in this application's DB to query for data newer than that time. If that timestamp isn't found (e.g., the first time the application runs) the value of this property will be used. This must be a valid ISO 8601 timestamp in the UTC time zone. The recommended value is `2020-03-01T00:00:00+00:00`.
`MIVIDEO`	`kaltura_partner_id`	This is an integer that represents the Kaltura account number. UMich ITS TL users can find this value in the usual security files folder.
`MIVIDEO`	`kaltura_user_secret`	This is a string that represents an administrator's key for the Kaltura account. UMich ITS TL users can find this value in the usual security files folder.
`MIVIDEO`	`kaltura_categories_full_name_in`	Filter for the Kaltura API to return media that have at least one category that begins with the string value of this key. The default value is "`Canvas_UMich`".
`UDW`		An object containing the necessary credential information for connecting to the Unizin Data Warehouse, where data will be pulled from.
`INVENTORY_DB`		An object containing the necessary credential information for connecting to a MySQL database, where output data will be inserted.

Installation & Usage

With Docker

This project provides a docker-compose.yaml file to help simplify the development and testing process. Invoking docker-compose will set up MySQL and a database in a container. It will then create a separate container for the job, which will ultimately insert records into the MySQL container's database.

Before beginning, perform the following additional steps to configure the project for Docker.

Create two paths in your home directory (i.e., ~ or ${HOME}): secrets/course-inventory and data/course-inventory.

The docker-compose.yaml file specifies two volumes that are mapped to these directories. The first, secrets/course-inventory, is mapped to config/secrets. The application expects to find the env.hjson file in this location. The second, data/course-inventory, is mapped to the project's data directory. This will allow later access to CSV files optionally generated by the application.
Move the env.hjson file to secrets/course-inventory so it will be mapped into the job container.
```
mv config/secrets/env.hjson ~/secrets/course-inventory
```

Once these steps are completed, you can use the standard docker-compose commands to build and run the application.

Build the images for the mysql and job services.
```
docker-compose build
```
Start up the services.
```
docker-compose up
```

docker-compose-up will first start the MySQL container and then the job container. When the job finishes, the job container will stop, but the MySQL container will continue running. This allows you to enter the container and execute queries.

docker exec -it course_inventory_mysql /bin/bash
mysql --user=ci_user --password=ci_pw

Use ^C to stop the running MySQL container, or -- if you used the detached flag -d with docker-compose up -- use docker-compose down.

Data in the MySQL database will persist after the container is stopped. The MySQL data is stored in a volume mapped to the .data/ directory in the project. To completely reset the database, delete the .data directory.

A Typical Development Cycle With Docker

Build images for all services…
```
docker-compose build
```
- (Optional) Run the DB service, mysql, in the background…
  
  Note that if this optional step is skipped, docker-compose will automatically run the DB service in the background when the main application service is started. That's because the application depends on the DB, so docker-compose will conveniently run it based on the dependencies described in docker-compose.yaml.
```
docker-compose up -d mysql
```
  The -d option (short for --detach), detaches the process from the terminal, and will "Run containers in the background, print new container names."
  - If you need to see the console output of the mysql service while it runs in the background, use the logs command and the service name…
```
docker-compose logs mysql
```
Run the main application service, job, in the foreground…
```
docker-compose up job
```
That will show the output from job, then return you to the shell prompt.
Do some development of job's code. (Go ahead, we'll wait.)
When ready to run job again, use the same command as before…
```
docker-compose up job
```
As before, that will show the output from job, then return you to the shell prompt.

This will work as long as docker-compose.yaml is configured to mount the project source code directory as /app in the container.
- If the container is not running with the project source code mounted as /app, then most code changes will require you to specify that the service needs to be rebuilt…
```
docker-compose up --build job
```
Repeat the previous two steps (3 and 4) as necessary.

To start up the job with VSCode Debug use this command and attach with VSCode.

docker-compose -f docker-compose.yaml -f ./.vscode/docker-compose-ptvsd.yaml up job

With a Virtual Environment

You can also set up the application using virtualenv by doing the following:

Create a virtual environment using virtualenv.

virtualenv venv
source venv/bin/activate  # for Mac OS

Install the dependencies specified in requirements.txt.
```
pip install -r requirements.txt
```
Initialize the database using create_db.py.
```
python create_db.py
```
Run the application.
```
python run_jobs.py
```

OpenShift Deployment

Deploying the application as a job using OpenShift and Jenkins involves several steps, which are beyond the scope of this README. However, a few details about how the job is configured are provided below.

The env.hjson file described in the Configuration section above needs to be made available to running course-inventory containers via an OpenShift ConfigMap, a type of Resource. A volume containing the ConfigMap should be mapped to the config/secrets subdirectory. These details will be specified in a YAML configuration file defining the pod.
By default, the application will run with the assumption that the HJSON configuration file will be named env.hjson. However, environ.py will also check for the environment variables ENV_DIR and ENV_FILE. These variables can be set using the OpenShift pod configuration file. To use a different name for the JSON file, set ENV_FILE to the desired file name. The default value is env.hjson. To use a different directory containing the HJSON file, set ENV_DIR to the desired directory path. The default value is /config/secrets.
- To ensure that the yoyo-migrations dependency can run successfully in a containerized environment, the environment variable USER should be defined.
- For the value of USER, use the name of the project running the job. The yoyo-migrations library will obtain this value by using the getpass.getuser function from the Python standard library.
With the above variables set, the env block in the YAML file will look something like this:
```
  - env:
    - name: ENV_DIR
      value: /config/test_secrets
    - name: ENV_FILE
      value: env_test.json
    - name: USER
      value: project_name
```

Implementing a New Job

The application was designed with the goal of being extensible -- in order to aid collaboration, integrate new data sources, and satisfy new requirements. This is primarily made possible by enabling the creation of new jobs, which are managed by the run_jobs.py file (the starting point for Docker). When executed, the file will attempt to run all jobs provided in the value for the JOB_NAMES variable in env.hjson. Only jobs previously defined in the codebase will be actually executed.

Follow the steps below to implement a new job that can be executed from run_jobs.py. All the changes described below (minus the configuration changes) should be included in the pull request.

Place files used only by the new job within a separate, appropriately named package (e.g. course_inventory or online_meetings).
Make use of variables from the env.hjson configuration file by importing the ENV variable from environ.py.
Ensure you have one function or method defined that will kick off all other steps in the job. It should return a list of DataSourceStatus objects, each containing the name of a data source used during the job, and a timestamp of when that data was updated (or collected).

These objects are used to create new records in the data_source_status table of the application database. Objects are instantiated in the following way:
```
DataSourceStatus(ValidDataSourceName.VALID_DATA_SOURCE_NAME_MEMBER)
```
In place of VALID_DATA_SOURCE_NAME_MEMBER, use a member of the ValidDataSourceName enumeration defined in vocab.py. The resulting object will include a timestamp for the current time at which the object was instantiated. That is sufficient if the data source doesn't provide a timestamp for the data.

If the data source does provide a timestamp for the data, use that. It can be passed into the instantiation as the second, optional argument:
```
DataSourceStatus(ValidDataSourceName.VALID_DATA_SOURCE_NAME_MEMBER, some_timestamp)
```
The value for some_timestamp must be a datetime object (or equivalent; e.g., pd.Timestamp) with the time zone set to UTC.
Add a new entry to the ValidJobName enumeration within vocab.py. The name (on the left) should be in all capitals. The value (on the right) should be a period-delimited path string, where the first element is the package name, the second is the module or file name, and the third is the name of the job's entry method or function. See vocab.py for examples.
If you are introducing a new data source, you also need to add an entry to the ValidDataSourceName enumeration. The name should be all capitals; the value has no meaning for the application, so auto() is sufficient.
Add the job name to the JOB_NAMES environment variable.

Database Management and Schema Changes

Currently, the database is version-controlled and managed using the yoyo-migrations Python library. The migration files are located in the db/migrations directory.

To make changes to the database schema, perform the follow steps in order.

Add a new migration file to the migrations directory called XXXX.add_something.py. XXXX is the next migration number (preceded by 0s until the number is four digits) add_something is an action describing the change made.
Within the file, import the step function from yoyo. For each desired schema change, pass a SQL string to step. Multiple step invocations can be enclosed in a list and assigned to a steps variable. Place each step in the order it should be applied. Migrations can also specify dependencies on previous migrations using the format __depends__ = {"000X.migration_name_without_file_ending"}.

Refer to the existing migrations if examples are needed.

Other Resources

Relevant Canvas API Documentation

Courses in account: https://canvas.instructure.com/doc/api/accounts.html#method.accounts.courses_api
Course object: https://canvas.instructure.com/doc/api/courses.html#Course
GraphQL: https://canvas.instructure.com/doc/api/file.graphql.html

Other Technology in Use

HJSON: https://hjson.github.io/
hjson Python package: https://pypi.org/project/hjson/
JSON Schema: https://json-schema.org/understanding-json-schema/

tl-its-umich-edu / course-inventory

readme