Add additional label types (type of animal, type of signal)

aserfass-msft commented 2 years ago

Want to create a quick proof of concept to show additional data that could be collected and used to train more nuanced models.

1) Orca, bird, other animals similar to Orcas (but in the end noise for our purposes) 2) Orca signals (calls, whistles, clicks, cavitation). Can we detect some types of signals from further away (indicators of arrival)?

scottveirs commented 2 years ago

Hey Andy, good idea! Thanks for creating this issue, which I'll label now as related to model training. Happy to talk with you and @dbainj1 (David Bain) about this in 92 or on a call tomorrow, but here are. few resources that could be fun to explore in the interim:

Since the 2021 hackathon, I've been working on being consistent in labeling false positives with tags that could be useful to you. You can explore these through the OrcaHello Dashboard (select all time filter, then browse tag cloud) or you should be able to query the API for them and filter by tag to get 60s audio clips.
Tools I've been using to be more consistent: the OrcaHello tag curation page (a feature request that @micowan implemented in 2021) and this annotation dictionary for Salish Sea signals (for aligning with labels used by our Canadian colleagues).

One hacky idea (for next year?) would be to adapt the Pod.Cast tool, but implement a different pre-labeling algorithm (e.g. a click detector or a whistle detector) possible developed for a different species like a bat or bird. One could feed it either 60s candidates from the false positive tag cloud (e.g. humpback, or pigeon guillemot) and then crowdsource (to me) the human adjustment of the pre-labels, resulting in new training data in the Pod.Cast .tsv format.

aserfass-msft commented 2 years ago

Thanks I'll take a look and try to find time for us to meet

scottveirs commented 2 years ago

Not sure if helps with your quick proof of concept, Andy, but here is some documentation I've been working on for OrcaHello moderators that could provide some context for how labels are generated.

https://github.com/orcasound/aifororcas-livesystem/wiki/Moderation

scottveirs commented 2 years ago

Also, @aserfass-msft -- regarding the last part of your second bullet -- yes, we do often hear some signals before others as the SRKWs approach. It's not uncommon for me to hear their clicks first. (I also notice the lower end of their echolocation clicks across the top [high frequency] edge of OrcaHello spectrograms, sometimes when they aren't yet audible due to masking noise at lower frequencies where we hear best). Typically, after first sensing the clicks intermittently, then I hear faint calls, then all signals including whistles as they make their closest approach passing the hydrophone.

scottveirs commented 1 year ago

Hey @aserfass-msft -- I'm thinking of closing this issue as I think that tags added by OrcaHello moderators should be sufficient -- both for your hackathon efforts to create a proof of concept, and for future model advancements. I've created #127 so that model developers could query the OrcaHello true or false positives for particular signals of interest via a new tag parameter in OrcaHello API.

LMK if you think this issue should remain open for particular reasons!

scottveirs commented 11 months ago

With the new bulk/in-line editing features that @micowan is adding, and on-going efforts to train new moderators to consistently tag candidates and curate the tags, I think we can close this issue.

orcasound / aifororcas-livesystem

Add additional label types (type of animal, type of signal) #113