Humpback whale algorithm from Google

Molkree commented 3 years ago

On OOI data https://tfhub.dev/google/humpback_whale/1

valentina-s commented 3 years ago

The extract_embedding_from_signal function from Audio_embeddings.ipynb can apply the Humpback model to a .wav file. There has been confirmed Humpback call on 10/26/21 at 13:40, thus it can be a good testing time frame blog on humpback activity.

Benjamintdk commented 2 years ago

Hi @Molkree,

Can I clarify if this issue is just to test the model on the audio file first, or to integrate it into the GH Actions workflow? Thanks.

valentina-s commented 2 years ago

@Benjamintdk, @wetdog already has the functionality to apply the model to the audio files, but it would be great to integrate in a pipeline as new data comes in, and storing summary statistics of the output.

Notebook example of humpback embedding

Molkree commented 2 years ago

@Benjamintdk, as Valentina mentioned above you can start with just applying an existing function from that notebook and see how it looks for the time with confirmed humpback whale call.

But the ultimate goal is to run it continuously, yes. I think that running it every day for the previous 24 hours would be overkill and we probably will need to find "interesting" time segments to run the neural net on them. But it's a different issue 😄

Making the model work on recordings from Orcasound is the first step, probably semi-manually supplying audios at first but later with the small script that takes start+end time and the node.

Benjamintdk commented 2 years ago

@valentina-s @Molkree Thanks so much for clarifying, I will attempt testing the model first in the notebook before looking to integrate it with the script. I still have a last question though: Will the model be used to process OrcaSound data or OOI data (cos you mentioned OOI in the original post)?

Benjamintdk commented 2 years ago

Hi @Molkree I was wondering how you are able to ascertain the specific Unix time stamp when accessing the AWS CLI data from the stream? I realized that I have to manually convert the UNIX timestamp to UTC via an online converter before converting to PDT every time. Is that the right way or is there an easier way that I'm missing out on?

Benjamintdk commented 2 years ago

Hi @valentina-s I have combined the functionalities as described and am able to generate the embeddings from the .wav files as well as use the model to predict correct probability scores on both positive and negative samples.

Can I also clarify what sort of output summary statistics will be helpful/meaningful? I wanted to clarify if the summary statistics you are referring to are the images and PCA results (explained variance) that the notebook generates, or something else such as the potential 'times' at which a humpback call is possibly detected? Thanks.

valentina-s commented 2 years ago

Great @Benjamintdk! Maybe as a starter you can use the scores because it is easier to visualize. We can think about a compact representation of the embeddings later.

valentina-s commented 2 years ago

Here is also a time when humpbacks were heard on orcasound_lab: "2020-01-08 2:06:00" (UTC time). It would be interesting what the algorithm finds during this period. Scott and Emily have most probably many more more recent examples.

valentina-s commented 2 years ago

Regarding the action workflow: updating a readme with a new plot would be a nice outcome.

Benjamintdk commented 2 years ago

@valentina-s thanks for the reply! I've evaluated the model as you suggested on the timing you provided and also another example (2020-11-04 04:15:00 UTC) according to this spreadsheet, both of which the model detects successfully.

Regarding the new plots to be included in the action workflow, will the plots attached below suffice? They show the scores plotted against time, and regions of interest are above the 0.5 threshold. I'm thinking that the workflow can generate a new plot for the day every time it is run, and update the Readme as you have suggested.

2020-01-08 (UTC)

2020-11-03 (UTC)

Benjamintdk commented 2 years ago

For some context, with regards to the weird and random spikes, the model predicts 3 scores for each wav file (because each file is 10 seconds long, and the model generates predictions for windows of 3.92 seconds), and I took the max score to represent each wav file to simplify the plots. I wasn't sure if averaging might work better, but I felt that it runs the risk of missing out important detections which are only apparent in specific parts of the wav file. Some things I feel I could potentially explore further for improvement:

1) Confirm if the weird spikes are due to humpback calls or just random noise by listening to those specific regions 2) Determine which third of the wav files these spikes occur most commonly at (e.g. if it's always the last third then it may be more likely that it is due to some feature of the audio file processing and not a humpback call) 3) Trying on more negative examples/time periods to check for the presence of such spikes

I can make a PR for the changes if this is tentatively ok.

Molkree commented 2 years ago

Will the model be used to process OrcaSound data or OOI data (cos you mentioned OOI in the original post)?

@Benjamintdk, hmmm I actually don't remember why I've mentioned OOI specifically, maybe because whoever suggested this said humpback whales might be around OOI nodes more often than Orcasound nodes? But anyway this should work for our recordings too so maybe run on both? We'll see how it goes, it seems like you made it work for Orcasound already.

Molkree commented 2 years ago

is there an easier way that I'm missing out on?

@Benjamintdk, urgh, sorry for the long reply to this. Yes, there is a way, try using this package. You can see the usage here.

The package needs some polish but should work I think.

We will also need to integrate it with Orcasound workflows in this repo.

Molkree commented 2 years ago

I'm thinking that the workflow can generate a new plot for the day every time it is run, and update the Readme as you have suggested.

@Benjamintdk

How would the plot for the whole day look?

Do you mean you would want to auto-update readme every day? I would be against this, this sounds like a good case for GitHub Pages or something like that but not committing directly to the repository.

will the plots attached below suffice?

Plots need a legend.

wetdog commented 2 years ago

Hi @Benjamintdk Great work on testing the Humpbacks model output. I'm going to be more active from this month if you have doubts about audio or the models.

-I just want to remember that the audio that is fed to the model needs to be at a 10Khz sample rate, in order to obtain valid scores from the model.

-The threshold value needs to be calibrated as they suggest on the hub page "For example, in our eval set, where about 10% of the examples were positives, precision was already 90% at a "probability" threshold of 0.1". That step can be made with the exploration steps that you suggested.

Finally, is common to use in sound event detection a Median filter to smooth the predictions of a classifier

Benjamintdk commented 2 years ago

is there an easier way that I'm missing out on?

@Benjamintdk, urgh, sorry for the long reply to this. Yes, there is a way, try using this package. You can see the usage here.

The package needs some polish but should work I think.

We will also need to integrate it with Orcasound workflows in this repo.

Thanks for sharing the tool, it will be really helpful!

I'm thinking that the workflow can generate a new plot for the day every time it is run, and update the Readme as you have suggested.

@Benjamintdk

How would the plot for the whole day look?

Do you mean you would want to auto-update readme every day? I would be against this, this sounds like a good case for GitHub Pages or something like that but not committing directly to the repository.

will the plots attached below suffice?

Plots need a legend.

Agreed with the comment about not committing to the repository directly. The Github Pages idea sounds good actually, but would probably require it to be set up in a separate PR first? Or perhaps the image could be saved in a folder in a separate branch first?

valentina-s commented 2 years ago

@Benjamintdk Plot looks good, and it is great that the events get detected! It is interesting though that the model output is spiky in general. Maybe certain frequencies get triggered more, but maybe just the noise is variable. Listening through some examples may indeed shed some light. Another experiment could be to apply to snippets where there are Orca calls. There are actually certain Orca calls which are similar to certain Humpback calls, but also certain Humpback sounds are very distinctive and long and I am sure @scottveirs will not confuse them. Regarding, the strategy to combine, max sounds ok. One can have a follow-up filter on a longer than 10 sec length, to check if the sounds are continuous. Usually Humpback have longer songs, and they are present for some time. That would eliminate some individual spikes.

valentina-s commented 2 years ago

@Molkree I believe @scottveirs might have mentioned that we know of certain times where in the OOI stream there are Humpbacks and some of our collaborators at UW have that information, but I do not have the exact datetimes with me. We can do OOI later.

scottveirs commented 2 years ago

@Molkree I believe @scottveirs might have mentioned that we know of certain times where in the OOI stream there are Humpbacks and some of our collaborators at UW have that information, but I do not have the exact datetimes with me. We can do OOI later.

@swearic Do you have date-times for any favorite humpback events on one of the Oregon shelf/slope OOI hydrophones?

For the Orcasound Lab events you can search for humpbacks in the Orcasound event log for now, and/or talk with Emily in Slack about annotated files she's completed thus far...

Benjamintdk commented 2 years ago

Hi @Benjamintdk Great work on testing the Humpbacks model output. I'm going to be more active from this month if you have doubts about audio or the models.

-I just want to remember that the audio that is fed to the model needs to be at a 10Khz sample rate, in order to obtain valid scores from the model.

-The threshold value needs to be calibrated as they suggest on the hub page "For example, in our eval set, where about 10% of the examples were positives, precision was already 90% at a "probability" threshold of 0.1". That step can be made with the exploration steps that you suggested.

Finally, is common to use in sound event detection a Median filter to smooth the predictions of a classifier

Hi @wetdog, thanks for the advice and nice to meet you!

Yes, I sampled the audio at 10kHz as per the TensorFlow Hub page specifications
Thanks for giving me a heads up on the threshold, didn't realize that I missed out on! I have performed the median filtering (with a window size of 5, so approximately every 40-50 seconds) as you suggested, and much of the noise was removed, as per the plots below. I think ultimately we'll probably only be able to calibrate a rigorous estimate of the threshold after obtaining a decent sized labelled dataset with both song and non song calls.
I have also sieved out the wav files which gave 'false positives' (unprecedented spikes), and when listening to them, I cannot clearly discern whether there is a humpback call there. Will further this discussion on the slack chat and hopefully get the advice of experts regarding this. From these 2 examples, it seems that:
1) A threshold of 0.2-0.3 should be sufficient to differentiate between call and no-call given trivial background noise, but right now it only seems to apply to song calls (need to confirm this)?
2) The small spikes from 8.30am to 10.30am on the bottom plot seems to be due to ship noise from my listening (I'm not entirely sure, but there definitely is something humming in the background), and produce scores in the range of 0.4 - 05. Seems like the model might not be so robust to other types of calls/noise with the current parameters. I also intend to do as what @valentina-s suggested and try out on killer whale calls over the coming days to see what the results might be. Thinking it'd also be good to compile a toy dataset in future containing "hard" examples of easily mixed up humpback and KW sounds to stress test our models against.
Also @Molkree, the plot at the bottom is that of a whole day's worth of audio (2020-11-03 extended) as requested. One of my concerns is that the x-label time denominations look a little large on a one day basis, although the fix for that should be simple. Another concern would of course be the time taken for processing, it takes slightly over an hour on my local machine, though I have not yet explored refactoring my code to account for multiprocessing options yet. Do you have further concerns regarding this?

Sorry for making this post extra long, just had a lot of thoughts regarding this. Do let me know if it's getting too long and if it should be shifted to slack instead perhaps.

2020-01-08 (UTC) smoothed

2020-11-03 (UTC) smoothed

Benjamintdk commented 2 years ago

@Benjamintdk Plot looks good, and it is great that the events get detected! It is interesting though that the model output is spiky in general. Maybe certain frequencies get triggered more, but maybe just the noise is variable. Listening through some examples may indeed shed some light. Another experiment could be to apply to snippets where there are Orca calls. There are actually certain Orca calls which are similar to certain Humpback calls, but also certain Humpback sounds are very distinctive and long and I am sure @scottveirs will not confuse them. Regarding, the strategy to combine, max sounds ok. One can have a follow-up filter on a longer than 10 sec length, to check if the sounds are continuous. Usually Humpback have longer songs, and they are present for some time. That would eliminate some individual spikes.

@valentina-s, noted on the longer songs, I do notice from the paper as well that they used 75 second clips in their test set, which I'm thinking could be done and perhaps improve the scores. Also noted on the possible confusion if resident killer whale calls, which I intend to clarify with Emily. Thanks for the guidance as well @scottveirs

Molkree commented 2 years ago

@Benjamintdk

The Github Pages idea sounds good actually, but would probably require it to be set up in a separate PR first? Or perhaps the image could be saved in a folder in a separate branch first?

Yeah, I guess? So do you suggest adding the image to the separate branch and deploying GH Pages from that branch?

This Action you linked is a bit funny, I'm not sure how it's better than just using git directly :) But I haven't looked into it too much so might be wrong! Do tell me if you think otherwise 😉

the plot at the bottom is that of a whole day's worth of audio

Oh, I didn't realize, thanks for the clarification.

it takes slightly over an hour on my local machine, though I have not yet explored refactoring my code to account for multiprocessing options yet. Do you have further concerns regarding this?

Slightly over an hour is actually good enough for 24 hours of data. But you utilize GPU most likely (right?) which we won't have in GH Actions VM so that's my concern. We won't know until we try though 😅

Molkree commented 2 years ago

Do let me know if it's getting too long and if it should be shifted to slack instead perhaps.

I prefer long posts and discussions on GitHub please! 😁

Benjamintdk commented 2 years ago

Yeah, I guess? So do you suggest adding the image to the separate branch and deploying GH Pages from that branch?

@Molkree, yep I think that should work. I've actually created a gh-pages branch on my forked repo to try it out already. I can open a PR first for this if ok, but will need a gh-pages branch on this repo to already be created before I can submit it.

This Action you linked is a bit funny, I'm not sure how it's better than just using git directly :) But I haven't looked into it too much so might be wrong! Do tell me if you think otherwise 😉

Yeah actually now that you mention it, I think using git directly would be easier and more straightforward.

Slightly over an hour is actually good enough for 24 hours of data. But you utilize GPU most likely (right?) which we won't have in GH Actions VM so that's my concern. We won't know until we try though 😅

Good point actually. Yes, I'm currently testing with a GPU, but since it's only performing inference, I don't think it will be drastically slower on a CPU? I intend to benchmark this soon, but basing off my prior experience from working with ResNet-50 (which the humpback model uses) it should not be too much longer. I think the main slowdown now is because I'm processing each timestamp in a pythonic for loop, which makes it agonizingly slow.

valentina-s commented 2 years ago

@Benjamintdk I have added an empty gh-pages branch and have activated it.

orcasound / orca-action-workflow

Humpback whale algorithm from Google #14