tl-its-umich-edu / course-inventory

Application for reporting on Canvas, LTI, and MiVideo usage
Apache License 2.0
3 stars 6 forks source link

course-inventory

Codacy Badge

[TOC]

Overview

The course-inventory application is designed to gather current-term Canvas LMS data about courses, enrollments, users, and course activity -- as well as data about the usage of other technologies, including Zoom and MiVideo -- in order to inform leadership at the University of Michigan about the usage of tools for teaching and learning. Currently, the application collects data from various APIs and data services managed by Unizin Consortium. It then then stores the data in an external MySQL database. Tableau dashboards and other processes then consume that data to generate reports and visualizations.

Development

Pre-requisities

The sections below provide instructions for configuring, installing, using, and changing the application. Depending on the environment you plan to run the application in, you may also need to install some or all of the following:

While performing any of the actions described below, use a terminal, text editor, or file utility as necessary. Some sample command-line instructions are provided for some steps.

Configuration

To configure the application before installation and usage (see the next section), you must first perform a few steps. This includes the creation of a configuration file called env.hjson using the HJSON format -- a more lenient and customizable variant of JSON. Complete the following items in order.

  1. Clone and navigate into the repository.

    git clone https://github.com/tl-its-umich-edu/course-inventory.git  # HTTPS
    git clone git@github.com:tl-its-umich-edu/course-inventory.git      # SSH
    
    cd course-inventory
  2. Set up a MySQL database.

    If you plan to run the application using virtualenv, you will need to have MySQL installed on your machine. You will also need to create a test database and user.

    If you use Docker, instead you will use the database credentials specified in the docker-compose.yaml. This is in the environment block (ignoring MYSQL_ROOT_PASSWORD) for the mysql service.

    Whether you use virtualenv or Docker, provide the database credentials within the INVENTORY_DB object. This is described more in step 4.

  3. Copy the template configuration file, env_blank.hjson from the config directory, re-name it env.hjson, and place it inside the secrets subdirectory.

    mv config/env_blank.hjson config/secrets/env.hjson
  4. Change the default values inside env.hjson (empty strings, 0s, and provided values) with the desired values, ensuring they are the same data type. The table below describes the meaning and expected values of each key-value pair. If the value of the outermost key is an object, the description may refer instead to the nested key column. The application will also validate the configuration file you create using JSON Schema, so look for error messages when first running the application.

    Key Nested Key Description
    LOG_LEVEL The minimum level for log messages that will appear in output. INFO or DEBUG is recommended for most use cases; see Python's logging module.
    JOB_NAMES The names of one or more jobs (not case sensitive) that have been implemented and defined in run_jobs.py (see the Implementing a New Job section below).
    CREATE_CSVS A Boolean value (true or false) indicating whether CSVs should be generated by the execution.
    MAX_REQ_ATTEMPTS The number of times a specific request will be attempted.
    NUM_ASYNC_WORKERS Number of workers for asynchronous API calls; the default is 8.
    CANVAS CANVAS_ACCOUNT_ID The Canvas instance root account ID number associated with the courses for which data will be collected.
    CANVAS CANVAS_TERM_IDS The Canvas instance term ID numbers that will be used to limit queries for Canvas courses.
    CANVAS ADD_COURSE_IDS Additional Canvas course IDs to retrieve when using online_meetings/canvas_zoom_meetings.py. Duplicate courses found also using CANVAS_TERM_IDS will be removed.
    CANVAS API_BASE_URL The base URL for making requests using the U-M API Directory; the default value should be correct.
    CANVAS API_SCOPE_PREFIX The scope prefix that will be added after the API_BASE_URL; this is usually an acronym for the university location and the API Directory subscription name in CamelCase, separated by /.
    CANVAS API_SUBSCRIPTION_NAME The name of the API Directory subscription all in lowercase.
    CANVAS API_CLIENT_ID The client ID for authenticating to the API Directory.
    CANVAS API_CLIENT_SECRET The client secret for authenticating to the API Directory.
    CANVAS CANVAS_URL The Canvas instance URL to be used as the base URL for API requests that use the CANVAS TOKEN.
    CANVAS CANVAS_TOKEN The Canvas token used for authenticating to the API when not using the U-M API Directory.
    MIVIDEO udp_service_account_json_filename The name of the JSON credential file for accessing UDP's Google BigQuery service account. It should be the umich-its-tl-reports-prod.json credential file for UMich ITS TL. This file name is appended to the value of ENV_DIR (which is /config/secrets, by default) to determine the full path to the file.

    If this key's value is set to umich-its-tl-reports-prod.json and ENV_DIR has its default value, the full path to the file will be /config/secrets/umich-its-tl-reports-prod.json.
    MIVIDEO default_last_timestamp The MiVideo procedures use the last timestamp found in its tables in this application's DB to query for data newer than that time. If that timestamp isn't found (e.g., the first time the application runs) the value of this property will be used. This must be a valid ISO 8601 timestamp in the UTC time zone. The recommended value is 2020-03-01T00:00:00+00:00.
    MIVIDEO kaltura_partner_id This is an integer that represents the Kaltura account number. UMich ITS TL users can find this value in the usual security files folder.
    MIVIDEO kaltura_user_secret This is a string that represents an administrator's key for the Kaltura account. UMich ITS TL users can find this value in the usual security files folder.
    MIVIDEO kaltura_categories_full_name_in Filter for the Kaltura API to return media that have at least one category that begins with the string value of this key. The default value is "Canvas_UMich".
    UDW An object containing the necessary credential information for connecting to the Unizin Data Warehouse, where data will be pulled from.
    INVENTORY_DB An object containing the necessary credential information for connecting to a MySQL database, where output data will be inserted.

Installation & Usage

With Docker

This project provides a docker-compose.yaml file to help simplify the development and testing process. Invoking docker-compose will set up MySQL and a database in a container. It will then create a separate container for the job, which will ultimately insert records into the MySQL container's database.

Before beginning, perform the following additional steps to configure the project for Docker.

  1. Create two paths in your home directory (i.e., ~ or ${HOME}): secrets/course-inventory and data/course-inventory.

    The docker-compose.yaml file specifies two volumes that are mapped to these directories. The first, secrets/course-inventory, is mapped to config/secrets. The application expects to find the env.hjson file in this location. The second, data/course-inventory, is mapped to the project's data directory. This will allow later access to CSV files optionally generated by the application.

  2. Move the env.hjson file to secrets/course-inventory so it will be mapped into the job container.

    mv config/secrets/env.hjson ~/secrets/course-inventory

Once these steps are completed, you can use the standard docker-compose commands to build and run the application.

  1. Build the images for the mysql and job services.

    docker-compose build
  2. Start up the services.

    docker-compose up

docker-compose-up will first start the MySQL container and then the job container. When the job finishes, the job container will stop, but the MySQL container will continue running. This allows you to enter the container and execute queries.

docker exec -it course_inventory_mysql /bin/bash
mysql --user=ci_user --password=ci_pw

Use ^C to stop the running MySQL container, or -- if you used the detached flag -d with docker-compose up -- use docker-compose down.

Data in the MySQL database will persist after the container is stopped. The MySQL data is stored in a volume mapped to the .data/ directory in the project. To completely reset the database, delete the .data directory.

A Typical Development Cycle With Docker
  1. Build images for all services…

    docker-compose build
    • (Optional) Run the DB service, mysql, in the background…

      Note that if this optional step is skipped, docker-compose will automatically run the DB service in the background when the main application service is started. That's because the application depends on the DB, so docker-compose will conveniently run it based on the dependencies described in docker-compose.yaml.

      docker-compose up -d mysql

      The -d option (short for --detach), detaches the process from the terminal, and will "Run containers in the background, print new container names."

      • If you need to see the console output of the mysql service while it runs in the background, use the logs command and the service name…

        docker-compose logs mysql
  2. Run the main application service, job, in the foreground…

    docker-compose up job

    That will show the output from job, then return you to the shell prompt.

  3. Do some development of job's code. (Go ahead, we'll wait.)

  4. When ready to run job again, use the same command as before…

    docker-compose up job

    As before, that will show the output from job, then return you to the shell prompt.

    This will work as long as docker-compose.yaml is configured to mount the project source code directory as /app in the container.

    • If the container is not running with the project source code mounted as /app, then most code changes will require you to specify that the service needs to be rebuilt

      docker-compose up --build job
  5. Repeat the previous two steps (3 and 4) as necessary.

  6. To start up the job with VSCode Debug use this command and attach with VSCode.

    docker-compose -f docker-compose.yaml -f ./.vscode/docker-compose-ptvsd.yaml up job

With a Virtual Environment

You can also set up the application using virtualenv by doing the following:

  1. Create a virtual environment using virtualenv.

    virtualenv venv
    source venv/bin/activate  # for Mac OS
  2. Install the dependencies specified in requirements.txt.

    pip install -r requirements.txt
  3. Initialize the database using create_db.py.

    python create_db.py
  4. Run the application.

    python run_jobs.py

OpenShift Deployment

Deploying the application as a job using OpenShift and Jenkins involves several steps, which are beyond the scope of this README. However, a few details about how the job is configured are provided below.

Implementing a New Job

The application was designed with the goal of being extensible -- in order to aid collaboration, integrate new data sources, and satisfy new requirements. This is primarily made possible by enabling the creation of new jobs, which are managed by the run_jobs.py file (the starting point for Docker). When executed, the file will attempt to run all jobs provided in the value for the JOB_NAMES variable in env.hjson. Only jobs previously defined in the codebase will be actually executed.

Follow the steps below to implement a new job that can be executed from run_jobs.py. All the changes described below (minus the configuration changes) should be included in the pull request.

  1. Place files used only by the new job within a separate, appropriately named package (e.g. course_inventory or online_meetings).

  2. Make use of variables from the env.hjson configuration file by importing the ENV variable from environ.py.

  3. Ensure you have one function or method defined that will kick off all other steps in the job. It should return a list of DataSourceStatus objects, each containing the name of a data source used during the job, and a timestamp of when that data was updated (or collected).

    These objects are used to create new records in the data_source_status table of the application database. Objects are instantiated in the following way:

    DataSourceStatus(ValidDataSourceName.VALID_DATA_SOURCE_NAME_MEMBER)

    In place of VALID_DATA_SOURCE_NAME_MEMBER, use a member of the ValidDataSourceName enumeration defined in vocab.py. The resulting object will include a timestamp for the current time at which the object was instantiated. That is sufficient if the data source doesn't provide a timestamp for the data.

    If the data source does provide a timestamp for the data, use that. It can be passed into the instantiation as the second, optional argument:

    DataSourceStatus(ValidDataSourceName.VALID_DATA_SOURCE_NAME_MEMBER, some_timestamp)

    The value for some_timestamp must be a datetime object (or equivalent; e.g., pd.Timestamp) with the time zone set to UTC.

  4. Add a new entry to the ValidJobName enumeration within vocab.py. The name (on the left) should be in all capitals. The value (on the right) should be a period-delimited path string, where the first element is the package name, the second is the module or file name, and the third is the name of the job's entry method or function. See vocab.py for examples.

  5. If you are introducing a new data source, you also need to add an entry to the ValidDataSourceName enumeration. The name should be all capitals; the value has no meaning for the application, so auto() is sufficient.

  6. Add the job name to the JOB_NAMES environment variable.

Database Management and Schema Changes

Currently, the database is version-controlled and managed using the yoyo-migrations Python library. The migration files are located in the db/migrations directory.

To make changes to the database schema, perform the follow steps in order.

  1. Add a new migration file to the migrations directory called XXXX.add_something.py. XXXX is the next migration number (preceded by 0s until the number is four digits) add_something is an action describing the change made.

  2. Within the file, import the step function from yoyo. For each desired schema change, pass a SQL string to step. Multiple step invocations can be enclosed in a list and assigned to a steps variable. Place each step in the order it should be applied. Migrations can also specify dependencies on previous migrations using the format __depends__ = {"000X.migration_name_without_file_ending"}.

    Refer to the existing migrations if examples are needed.

Other Resources

Relevant Canvas API Documentation

Other Technology in Use