Compute system latencies and visualize

A potential Key Performance Indicator (KPI) for the OrcaHello system is how long it takes between the first high-confidence (>50%) AI detection and notification of real time end-users.

Problem We don't track OrcaHello system performance and therefore only have a rough intuitive idea of an answer to this question:

What is the latency of each step in the ML pipeline, including the humans in the loop?

Background

Here's an initial articulation of the the steps where latency could be measured, visualized, and eventually reduced over time:

1) Time between audio data acquisition at a hydrophone node and moderator notification output. This would be governed by the each of: a) the duration of each live-streamed audio segment (currently 10 seconds) b) the duration of an OrcaHello candidate (currently a 60-sec concatenation of ~6 Orcasound HLS segments) c) the amount of time it takes to run the model on all ~2.5 sec sections of the 60-sec candidate d) the time it takes to compute mean confidence and a spectrogram for the candidate e) the time it takes to issue a notification to a moderator 2) The delay between when the moderator notification is sent and a moderator validates the candidate 3) The delay between moderator validation and notification of end-users (currently dictated by the SendGrid integration, but potentially sped up and made cheaper by integration with the general Orcasound notification system)

For example, a simple subtraction of the two date-times displayed in the moderator portal would be an easy initial metric to display: Screenshot 2024-08-08 at 11 24 37 AM

Proposed solutions:

Compute some measures of latency and display them in the OrcaHello Dashboard (for the default or requested time period). It might be interesting to plot the metrics for all candidates, and then a breakdown by true positives and false positives. This could be used internally (for authenticated users) or even publicly to promote a friendly competition between moderators. (Of the moderator beta-testers, who is fastest to respond? Who moderates more at night than during the day? How does the team do during holidays vs not (i.e. when maybe all 3 are distracted with family/travel)?
Compute latency metrics within Azure then aggregate them with similar measures, e.g. human detection latency within the Orcasound live-listening web app, and then visualize the results in a way that tracks and incentivizes human+machine system performance. One place to do this might be the network status dashboard (drafted in early 2024 by @dthaler ) or a similar "high-level" dashboard that has been discussed in past hackathons, see #88 ...

orcasound / aifororcas-livesystem

Compute system latencies and visualize #157