mcguirepr89 / BirdNET-Pi

A realtime acoustic bird classification system for the Raspberry Pi 4B, 3B+, and 0W2 built on the TFLite version of BirdNET.
Other
1.28k stars 137 forks source link

Spectrogram optimization #33

Closed DD4WH closed 2 years ago

DD4WH commented 2 years ago

Proposal to change spectrogram.sh

sudo -u pi sox "${analyzing_now}" -n remix 1 rate 16k spectrogram -t "Currently Analyzing" -c "${analyzing_now}" -o "${spectrogram_png}"

mcguirepr89 commented 2 years ago

Thank you for this proposal -- I will test it out and get back to you soon

mcguirepr89 commented 2 years ago

this will be changed in the next version -- I have opted for 20k as I noticed some truncated Northern Cardinal calls

DD4WH commented 2 years ago

Thanks! Would also be good to do the same in the extraction script for the extraction spectrograms.

mcguirepr89 commented 2 years ago

I am actually still messing with this -- It seems several of the species on my patio hit frequencies over 8kHZ, so I've been moving it up and up and have found that "24k" (showing up to 12kHZ) displays the full sound better. I don't know what I'm doing with these spectrograms at all from a biological/conservationist/ornithological perspective, so I'm just going off of aesthetics. If you can help me better understand what the frequency range's significance is, I may feel I have more to go off of than just aesthetics?

What do you think? Is 12kHz visible too much extra frequency?

By the way, I'm making these changes to both the "Spectrogram View" and the Extracted spectrograms as I go.

DD4WH commented 2 years ago

My main point with this proposal was to optimize three things:

As far as I understand the approach in sox, the audio is resampled with the sample frequency specified in the call rate 16k spectrogram. This leads to an upper maximum frequency of 8kHz (its always half the sample rate). By resampling the audio data to the desired sample rate and then calculating the necessary FFTs to produce the spectrogram, one has the optimum frequency resolution. For example, if sox would use a 1024 point FFT (not sure which FFT size is really used by sox), the frequency resolution of the spectrogram is 24000Hz / 1024 = 23.4Hz. If we had used the original sample rate of the audio without resampling, we would have had 48000 / 1024 = 46.9Hz resolution. So, the lower the sample rate, the higher the resolution (when keeping FFT size constant).

I am perfectly OK with your approach of using 24k == 12kHz max frequency in the spectrograms!

Most birds do not have fundamental frequencies larger than 8kHz, but some sing a little higher (also depends on where you are in the world) and also some of the overtones may be higher. But the main information is mainly below 8-10kHz. Having the spectrogram end at 12kHz (rather than 24kHz, which was the max freq in past versions) doubles the resolution in the spectrogram which is very nice!

Spectrograms can be very useful, I think. For example, when looking for Eurasian Pygmy Owl in audio files, I always look at the spectrogram first and listen to the file only if the visual inspection has revealed a high probability for the species.

mcguirepr89 commented 2 years ago

Thank you as always for your erudite and very helpful info for these spectrograms! The overtones are certainly what I'm seeing get truncated as the sound itself hits much lower frequencies -- this makes much more sense to me and I think it is in a good place now. I'm hoping that folks will reference this for their specific needs in case they're hunting for a particularly guttural call or a squeaky chirp, they can adjust the sox to home in more easily on what they're truly after.

I'm closing this -- thanks for the suggestion!