orcasound / aifororcas-livesystem

Real-time AI-assisted killer whale notification system (model and moderator portal) :star:
http://orcahello.ai4orcas.net/
MIT License
36 stars 25 forks source link

OrcaHello false negative event: SRKWs at Orcasound Lab on 11/2/21 #77

Open scottveirs opened 2 years ago

scottveirs commented 2 years ago

For this event, I received no notification as a moderator, nor did the OrcaHello system seem to create any candidates that are visible within the moderator portal. Having reviewed much of the continuous recording, I believe there are many SRKW calls that would have been detected by the system as it was performing in late 2020 and early 2021.

Orcasound Lab (Haro Strait)

11/2/2021 | 14:05:20

11/2/2021 | 16:15

Greater than 2 hours of SRKW signals at intermediate to high SNR. Signals included an unusually wide variety of SRKW calls; Monika Wieland estimated hearing about ~2/3 of the SRKW repertoire, whereas we typically hear <1/4. There were

TBD

I will provide a link to a blog post presenting the continuous recording (HLS segments transcoded to .ogg and .mp3). For now, the start/stop date-times are listed in the shared Orcasound annotation candidate spreadsheet.

prakruti commented 2 years ago

Thanks for filing the issue @scottveirs, will take a look.

prakruti commented 2 years ago

@scottveirs could you remind me of the 3 nodes' restarting schedule?

scottveirs commented 2 years ago

Sure, @prakruti, the current schedule is:

  1. Orcasound Lab -- restarts streaming container every six hours with first local daily restart at

Local: 0:30:19 GMT-08:00

  1. Port Townsend -- restarts streaming container every six hours with first local daily restart at

Local: 04:30:21 GMT-08:00

  1. Bush Point -- restarts streaming container every six hours with first local daily restart at

Local: 04:30:18 GMT-08:00

prakruti commented 2 years ago

Ongoing investigation update: Going by the triaging list and crossing out options.

  1. Hydrophones are broken
  2. AWS buckets are not picking up audio correctly
  3. Inference system (ACI or AKS) was down: ACI was in flux and I believe the container was offline for some period during the event. Notably, there was a stop-start container event at 16:55 PM on 11/2. We should still run a long running test IMO to find out why the containers crash in the first place (complete with stack traces etc.). Will attempt to run the test as part of this bug as well. I suspect it may have something to do with the Orcasound Raspberry Pi restarts that we don't handle properly but that's just a guess.
  4. Inference system has false negatives: I am currently looking at this option. I ran the inference system on the given date range with a local threshold of 0.5 and a global threshold of 3 with the same model. The first positive is detected at 2021-11-02-15-07 which is 1 hour after the event start date. (Since we did not see candidates in the portal, this further suggests that the ACI container was not functional). I could see some local predictions that went up to 0.3 but did not cross the 0.5 threshold. Our investigation should ideally yield a recommendation about whether we must update the threshold or train the model with more data if we find that that is the root cause of the false negatives. I need some more time to conclude the investigation though.
  5. CosmosDB is broken.
  6. Moderator portal is broken.
  7. Notification system broken (Azure Functions)
  8. SendGrid broken

Other comments: How I tested the inference system locally python src/LiveInferenceOrchestrator.py --config ./config/Test/FastAI_DateRangeHLS_OrcasoundLab.yml (with the correct start and end times). Using DateRangeHLS to test the inference system was broken and so was the actual checked-in live inference script. (PR with fixes coming soon). We really need build + unit tests to prevent this from happening in the future.

Molkree commented 2 years ago

Using DateRangeHLS to test the inference system was broken and so was the actual checked-in live inference script. (PR with fixes coming soon).

Hey @prakruti, could you describe what was broken and your fixes? I can't find the PR đŸ˜…