Retrain model(s) and compare performance

As of the 2024 Hackathon, we have about 4000 false positive candidates that have been moderated by me, Dave @dbainj1 , or Val. Each candidate is 60-seconds long and contains a number of model predictions (each with a start time and confidence level).

It would be ideal to iteratively retrain the model using these false positives. We have aspired to do such retraining annually during the Microsoft hackathons, but might eventually aspire to doing it more frequently -- like monthly or whenever we reach a threshold of new false positives. Some day we may partition the false positives by location to fine tune models for hydrophones in specific geographic locations.

Here are some resources and steps that could be used to implement a re-training solution:

[ ] Use the OrcaHello API to query for (recent) false positive candidates
- [ ] Old API -- using either of these queries:

api/detections/false-positives
api/detections/confirmed

[ ] New API -- Not ready yet, but you could eventually query for any state of a candidate (set of detections): Unreviewed, Negative, Positive, or Unknown.

[ ] Parse the API query results (JSON) to obtain the start time of each detection within the candidate

The Swagger API documentation allows you to explore the data scheme and JSON results of a query:

[ ] Retrieve the .wav file from blob storage and pre-process it for retraining
- [ ] Approach 1: Treat each 60-sec candidate as a source of noise, e.g. break it into 2.5 sec segments all of which are used as negatives in retraining the OrcaHello binary SRKW call classifier
- [ ] Approach 2: Treat only each detection within the candidate as a source of negative training data (i.e. use the start/end time(s). In most cases this will only use a subset of the full 60-seconds of audio data.

The same work flow could also query the API for new candidates that moderators have confirmed are true positives.

[ ] Retrain the model using the new true and false positive samples
[ ] Use some sort of versioning control to name the new model, and include that name in future detection/candidate metadata
[ ] Test performance of the new model (@bnestor had some great ideas for assessing performance and bench marking in this pre-hackathon discussion)
[ ] If performance is improved, redeploy the model (on all hydrophone locations)

orcasound / aifororcas-livesystem

Retrain model(s) and compare performance #105