[TOC]
The course-inventory application is designed to gather current-term Canvas LMS data about courses, enrollments, users, and course activity -- as well as data about the usage of other technologies, including Zoom and MiVideo -- in order to inform leadership at the University of Michigan about the usage of tools for teaching and learning. Currently, the application collects data from various APIs and data services managed by Unizin Consortium. It then then stores the data in an external MySQL database. Tableau dashboards and other processes then consume that data to generate reports and visualizations.
The sections below provide instructions for configuring, installing, using, and changing the application. Depending on the environment you plan to run the application in, you may also need to install some or all of the following:
While performing any of the actions described below, use a terminal, text editor, or file utility as necessary. Some sample command-line instructions are provided for some steps.
To configure the application before installation and usage (see the next section), you must first perform a few steps.
This includes the creation of a configuration file called env.hjson
using the HJSON format --
a more lenient and customizable variant of JSON. Complete the following items in order.
Clone and navigate into the repository.
git clone https://github.com/tl-its-umich-edu/course-inventory.git # HTTPS
git clone git@github.com:tl-its-umich-edu/course-inventory.git # SSH
cd course-inventory
Set up a MySQL database.
If you plan to run the application using virtualenv
, you will need to have MySQL installed on your machine.
You will also need to create a test database and user.
If you use Docker, instead you will use the database credentials specified in the docker-compose.yaml
.
This is in the environment
block (ignoring MYSQL_ROOT_PASSWORD
) for the mysql
service.
Whether you use virtualenv
or Docker, provide the database credentials within the INVENTORY_DB
object.
This is described more in step 4.
Copy the template configuration file, env_blank.hjson
from the config
directory,
re-name it env.hjson
,
and place it inside the secrets
subdirectory.
mv config/env_blank.hjson config/secrets/env.hjson
Change the default values inside env.hjson
(empty strings, 0
s, and provided values) with the desired values, ensuring they are the same data type.
The table below describes the meaning and expected values of each key-value pair.
If the value of the outermost key is an object, the description may refer instead to the nested key column.
The application will also validate the configuration file you create using JSON Schema,
so look for error messages when first running the application.
Key | Nested Key | Description |
---|---|---|
LOG_LEVEL |
The minimum level for log messages that will appear in output. INFO or DEBUG is recommended for most use cases; see Python's logging module. |
|
JOB_NAMES |
The names of one or more jobs (not case sensitive) that have been implemented and defined in run_jobs.py (see the Implementing a New Job section below). |
|
CREATE_CSVS |
A Boolean value (true or false ) indicating whether CSVs should be generated by the execution. |
|
MAX_REQ_ATTEMPTS |
The number of times a specific request will be attempted. | |
NUM_ASYNC_WORKERS |
Number of workers for asynchronous API calls; the default is 8. | |
CANVAS |
CANVAS_ACCOUNT_ID |
The Canvas instance root account ID number associated with the courses for which data will be collected. |
CANVAS |
CANVAS_TERM_IDS |
The Canvas instance term ID numbers that will be used to limit queries for Canvas courses. |
CANVAS |
ADD_COURSE_IDS |
Additional Canvas course IDs to retrieve when using online_meetings/canvas_zoom_meetings.py . Duplicate courses found also using CANVAS_TERM_IDS will be removed. |
CANVAS |
API_BASE_URL |
The base URL for making requests using the U-M API Directory; the default value should be correct. |
CANVAS |
API_SCOPE_PREFIX |
The scope prefix that will be added after the API_BASE_URL ; this is usually an acronym for the university location and the API Directory subscription name in CamelCase, separated by / . |
CANVAS |
API_SUBSCRIPTION_NAME |
The name of the API Directory subscription all in lowercase. |
CANVAS |
API_CLIENT_ID |
The client ID for authenticating to the API Directory. |
CANVAS |
API_CLIENT_SECRET |
The client secret for authenticating to the API Directory. |
CANVAS |
CANVAS_URL |
The Canvas instance URL to be used as the base URL for API requests that use the CANVAS TOKEN . |
CANVAS |
CANVAS_TOKEN |
The Canvas token used for authenticating to the API when not using the U-M API Directory. |
MIVIDEO |
udp_service_account_json_filename |
The name of the JSON credential file for accessing UDP's Google BigQuery service account. It should be the umich-its-tl-reports-prod.json credential file for UMich ITS TL. This file name is appended to the value of ENV_DIR (which is /config/secrets , by default) to determine the full path to the file.If this key's value is set to umich-its-tl-reports-prod.json and ENV_DIR has its default value, the full path to the file will be /config/secrets/umich-its-tl-reports-prod.json . |
MIVIDEO |
default_last_timestamp |
The MiVideo procedures use the last timestamp found in its tables in this application's DB to query for data newer than that time. If that timestamp isn't found (e.g., the first time the application runs) the value of this property will be used. This must be a valid ISO 8601 timestamp in the UTC time zone. The recommended value is 2020-03-01T00:00:00+00:00 . |
MIVIDEO |
kaltura_partner_id |
This is an integer that represents the Kaltura account number. UMich ITS TL users can find this value in the usual security files folder. |
MIVIDEO |
kaltura_user_secret |
This is a string that represents an administrator's key for the Kaltura account. UMich ITS TL users can find this value in the usual security files folder. |
MIVIDEO |
kaltura_categories_full_name_in |
Filter for the Kaltura API to return media that have at least one category that begins with the string value of this key. The default value is "Canvas_UMich ". |
UDW |
An object containing the necessary credential information for connecting to the Unizin Data Warehouse, where data will be pulled from. | |
INVENTORY_DB |
An object containing the necessary credential information for connecting to a MySQL database, where output data will be inserted. |
This project provides a docker-compose.yaml
file to help simplify the development and testing process.
Invoking docker-compose
will set up MySQL and a database in a container.
It will then create a separate container for the job, which will ultimately insert records into the MySQL container's database.
Before beginning, perform the following additional steps to configure the project for Docker.
Create two paths in your home directory (i.e., ~
or ${HOME}
): secrets/course-inventory
and data/course-inventory
.
The docker-compose.yaml
file specifies two volumes that are mapped to these directories.
The first, secrets/course-inventory
, is mapped to config/secrets
.
The application expects to find the env.hjson
file in this location.
The second, data/course-inventory
, is mapped to the project's data
directory.
This will allow later access to CSV files optionally generated by the application.
Move the env.hjson
file to secrets/course-inventory
so it will be mapped into the job
container.
mv config/secrets/env.hjson ~/secrets/course-inventory
Once these steps are completed, you can use the standard docker-compose
commands to build and run the application.
Build the images for the mysql
and job
services.
docker-compose build
Start up the services.
docker-compose up
docker-compose-up
will first start the MySQL container and then the job container.
When the job finishes, the job container will stop, but the MySQL container will continue running.
This allows you to enter the container and execute queries.
docker exec -it course_inventory_mysql /bin/bash
mysql --user=ci_user --password=ci_pw
Use ^C
to stop the running MySQL container,
or -- if you used the detached flag -d
with docker-compose up
-- use docker-compose down
.
Data in the MySQL database will persist after the container is stopped.
The MySQL data is stored in a volume mapped to the .data/
directory in the project.
To completely reset the database, delete the .data
directory.
Build images for all services…
docker-compose build
(Optional) Run the DB service, mysql
, in the background…
Note that if this optional step is skipped, docker-compose will automatically run the DB service in the background when the main application service is started. That's because the application depends on the DB, so docker-compose will conveniently run it based on the dependencies described in
docker-compose.yaml
.
docker-compose up -d mysql
The -d
option (short for --detach
), detaches the process from
the terminal, and will "Run containers in the background, print
new container names."
If you need to see the console output of the mysql
service
while it runs in the background, use the logs
command and
the service name…
docker-compose logs mysql
Run the main application service, job
, in the foreground…
docker-compose up job
That will show the output from job
, then return you to the
shell prompt.
Do some development of job
's code. (Go ahead, we'll wait.)
When ready to run job
again, use the same command as before…
docker-compose up job
As before, that will show the output from job
, then return you
to the shell prompt.
This will work as long as docker-compose.yaml
is configured
to mount the project source code directory as /app
in the
container.
If the container is not running with the project source code
mounted as /app
, then most code changes will require you
to specify that the service needs to be rebuilt…
docker-compose up --build job
Repeat the previous two steps (3 and 4) as necessary.
To start up the job with VSCode Debug use this command and attach with VSCode.
docker-compose -f docker-compose.yaml -f ./.vscode/docker-compose-ptvsd.yaml up job
You can also set up the application using virtualenv
by doing the following:
Create a virtual environment using virtualenv
.
virtualenv venv
source venv/bin/activate # for Mac OS
Install the dependencies specified in requirements.txt
.
pip install -r requirements.txt
Initialize the database using create_db.py
.
python create_db.py
Run the application.
python run_jobs.py
Deploying the application as a job using OpenShift and Jenkins involves several steps, which are beyond the scope of this README. However, a few details about how the job is configured are provided below.
The env.hjson
file described in the Configuration section above needs to be made available to
running course-inventory containers via an OpenShift ConfigMap, a type of Resource. A volume containing the ConfigMap
should be mapped to the config/secrets
subdirectory. These details will be specified in a YAML configuration file
defining the pod.
By default, the application will run with the assumption that the HJSON configuration file will be named env.hjson
.
However, environ.py
will also check for the environment variables ENV_DIR
and ENV_FILE
.
These variables can be set using the OpenShift pod configuration file.
To use a different name for the JSON file, set ENV_FILE
to the desired file name. The default
value is env.hjson
.
To use a different directory containing the HJSON file, set ENV_DIR
to the desired directory
path. The default value is /config/secrets
.
yoyo-migrations
dependency can run successfully in a containerized environment,
the environment variable USER
should be defined. USER
, use the name of the project running the job.
The yoyo-migrations
library will obtain this value by using the
getpass.getuser
function from the Python standard library.With the above variables set, the env
block in the YAML file will look something like this:
- env:
- name: ENV_DIR
value: /config/test_secrets
- name: ENV_FILE
value: env_test.json
- name: USER
value: project_name
The application was designed with the goal of being extensible -- in order to aid collaboration,
integrate new data sources, and satisfy new requirements.
This is primarily made possible by enabling the creation of new jobs,
which are managed by the run_jobs.py
file (the starting point for Docker).
When executed, the file will attempt to run all jobs provided in the value for the JOB_NAMES
variable in env.hjson
.
Only jobs previously defined in the codebase will be actually executed.
Follow the steps below to implement a new job that can be executed from run_jobs.py
.
All the changes described below (minus the configuration changes) should be included in the pull request.
Place files used only by the new job within a separate, appropriately named package (e.g. course_inventory
or online_meetings
).
Make use of variables from the env.hjson
configuration file by importing the ENV
variable from environ.py
.
Ensure you have one function or method defined that will kick off all other steps in the job.
It should return a list of DataSourceStatus
objects, each containing the name of a data source
used during the job, and a timestamp of when that data was updated (or collected).
These objects are used to create new records in the data_source_status
table of the
application database. Objects are instantiated in the following way:
DataSourceStatus(ValidDataSourceName.VALID_DATA_SOURCE_NAME_MEMBER)
In place of VALID_DATA_SOURCE_NAME_MEMBER
, use a member of the ValidDataSourceName
enumeration defined in vocab.py
. The resulting object will include a timestamp for the
current time at which the object was instantiated. That is sufficient if the data source
doesn't provide a timestamp for the data.
If the data source does provide a timestamp for the data, use that. It can be passed into the instantiation as the second, optional argument:
DataSourceStatus(ValidDataSourceName.VALID_DATA_SOURCE_NAME_MEMBER, some_timestamp)
The value for some_timestamp
must be a datetime
object (or equivalent; e.g., pd.Timestamp
)
with the time zone set to UTC.
Add a new entry to the ValidJobName
enumeration within vocab.py
.
The name (on the left) should be in all capitals.
The value (on the right) should be a period-delimited path string,
where the first element is the package name,
the second is the module or file name,
and the third is the name of the job's entry method or function.
See vocab.py
for examples.
If you are introducing a new data source, you also need to add an entry to the ValidDataSourceName
enumeration.
The name should be all capitals; the value has no meaning for the application, so auto()
is sufficient.
Add the job name to the JOB_NAMES
environment variable.
Currently, the database is version-controlled and managed using the yoyo-migrations
Python library.
The migration files are located in the db/migrations
directory.
To make changes to the database schema, perform the follow steps in order.
Add a new migration file to the migrations
directory called XXXX.add_something.py
.
XXXX
is the next migration number (preceded by 0
s until the number is four digits)
add_something
is an action describing the change made.
Within the file, import the step
function from yoyo
.
For each desired schema change, pass a SQL string to step
.
Multiple step invocations can be enclosed in a list and assigned to a steps
variable.
Place each step
in the order it should be applied.
Migrations can also specify dependencies on previous migrations using the format __depends__ = {"000X.migration_name_without_file_ending"}
.
Refer to the existing migrations if examples are needed.
Relevant Canvas API Documentation
Other Technology in Use
hjson
Python package: https://pypi.org/project/hjson/