Pod.Cast 🎱 🐋 | Annotation system

Developed by Prakruti Gogia, Akash Mahajan and Nithya Govindarajan during Microsoft AI4Earth & OneWeek hackathons. (this is volunteer-driven & is not an official product)

For a general introduction to the Pod.Cast project, initiated in 2019, and its relationship to other AI for Orcas efforts, please read the Pod.Cast project general overview at ai4orcas.net.

Techinical Overview

podcast_server.py is a prototype flask-based web-app to label unlabelled bioacoustic recordings, while viewing predictions from a model. This is useful to setup some quick-and-dirty labelling sessions that don't need any advanced features such as automated model inference, user access roles, interfacing with other backends, gamification etc.

(See prediction-explorer for a related tool to quickly visualize & browse model predictions on a set of audio files. This runs locally)

Each page/session gets a unique URL (via the sessionid URL param), that you can use to share if you find something interesting
Refer to the instructions on the page for how to edit model predictions or create annotations
The progress bar tracks the current "round" of unlabelled sessions for which annotations have been submitted
If you aren't sure, or want to see a new one, skip & refresh loads a random (un-annotated) session without submitting anything

Dataset Creation

This tool has been used in an active learning style to create & release new training & test sets at orcadata/wiki.

To do so, a candidate 2-3hr window is identified, with likely activity (reported by sighting networks / Orcasound listeners). Data is processed from Orcasound's S3 archives as follows:
- Format conversion (HLS -> concatenated wav file)
- Audio is split into 1-minute easily browsable "sessions"
- Data to use for labelling/training is prioritized as follows:
  - Candidates are selected for labelling using predictions from an ML model, using a mid-low threshold (tuned for high recall). This helps discard data & prioritize labelling effort.
Each round generates new labelled data that improves models trained on this data, making them more robust to varied acoustic conditions at different hydrophone nodes.
Held-out test sets have also been created in a similar fashion as accuracy and robustness benchmarks.

Flowchart of feedback loop between model & human listeners

Architecture

This prototype is a single page application with a simple flask backend that interfaces with Azure blob storage. For simplicity/ease of access, this version doubles up use of blob storage as a sort of database. A JSON file acts as a single entry, and separate containers as sort of tables/collections (for now for this hack makes it easy to do quick-and-dirty viewing/editing in Azure Storage Explorer, or any equivalent blob viewer for S3 etc.).

Backend API:

GET /fetch/session/roundid

Scans the getcontainer blob for an unlabelled session, randomly picks & returns a {sessionid=X} response. The sessionid is simply the name of the corresponding X.JSON file on the blob. Updates/resets internal global variable backend_state that contains info for the progress bar.

GET /load/session/roundid/sessionid

GET Azure blob wav

Fetches the corresponding JSON file from the getcontainer blob. (For an example, see example-load.json) JSON file contains backend_state for the progress bar, and uri that points the client directly to the corresponding audio file on the blob storage.

POST /submit/session/roundid/sessionid

Writes a JSON to the postcontainer blob. (For an example, see example-submit.json, which has the same schema). Also updates internal global variable backend_state that contains info for the progress bar.

Client logic:

Primary logic is defined in main.js.

fetchUrl, dataUrl, postUrl in index.html define above API
The client first checks for the sessionid URL parameter & runs loadSession or fetchAndLoadSession as appropriate
This is done on page load and when a submit/skip button is clicked

Use & setup

Setup & local debugging

Create an isolated python environment, and pip install --upgrade pip && pip install -r requirements.txt. (Python 3.6.8 has been tested, though recent versions should likely work as dependencies are quite simple)
Set the environment variable FLASK_APP=podcast_server.py and FLASK_ENV=development. If you haven't made your own CREDS file yet, see #3. Once that's done from this directory start the server with python -m flask run, and browse to the link in the terminal (e.g. http://127.0.0.1:5000/) in your browser (Edge and Chrome are tested).
The CREDS.yaml specifies how the backend authenticates with blob storage & the specific container names to use. The provided file is a template and should be replaced:
- If you would like to test with an ongoing Pod.Cast round, ask for the credentials on the Orcasound slack
- If you are using your own blob account, see section Using your own blob storage

Note that when you run this locally, you will still be connecting & writing to the actual blob storage specified in CREDS.yaml so be careful.

Using your own blob storage

This assumes you have already created an Azure Storage account & know how to view & access it using Azure Storage Explorer.

Enable a CORS rule to the account. In short, setting this allows a browser client to directly make a request to the blob storage to retrieve a *.wav file.

Screenshot of Azure Storage explorer showing CORS permissions

Make sure you have 3 containers; [1]: audiocontainer .wav audio files (~1min duration - as each file forms one page/session) [2]: getcontainer model predictions specified in JSON format example-load.json corresponding to each .wav file [3]: postcontainer destination for user-submitted annotations in JSON format example-submit.json.
Enable public read-only access to blobs in audiocontainer (select the "blobs" option). Along with #1, this is required for the browser to directly retrieve *.wav files.

Screenshot of Azure Storage explorer to set public access level

Deployment to Azure App Service

Prerequsite: Install Azure CLI

Authenticate and setup your local environment to be using the right subscription

az login 
az account list --output table 
az account set --subscription SUBSCRIPTIONID

In the root directory of your application, create a deployment config file at .azure/config. This contains details about your resource group, appservice plan to use, etc. (An example file is at .azure/config)
Now run the following commands to deploy the app. The first command packages up your local directory into a *.zip for deployment and deploys the app on Azure. If an app with the same name in the deployment config file exists it will update it, else create a new app. The second command is to only be run the first time, to register the entry point of the app. (see note below)

az webapp up --sku B1 --dryrun
az webapp config set -g mldev -n aifororcas-podcast --startup-file "gunicorn --bind=0.0.0.0 --timeout 600 podcast_server:app"

This deployment example is loosely based on the Quickstart. We make a change to the startup command to register the different name of our app file podcast_server.py. (FYI some more details about the CLI commands used here are at: az-webapp-up, configuring-python-app)

References

This code uses a fork of audio-annotator for the frontend code. audio-annotator uses wavesurfer.js for rendering and playing audio. Please refer to the respective references for more info on the core functions/classes used in this repo. (Note: the wavesurfer.js version used here is older than the current docs).

Icons used in readme flowcharts were made by Prosymbols from www.flaticon.com.

orcasound / aifororcas-podcast

readme