kahst / BirdNET-Analyzer

BirdNET analyzer for scientific audio data processing.
Other
829 stars 152 forks source link

Issues with training a classifier to identify individual frogs #158

Open kahst opened 1 year ago

kahst commented 1 year ago

I am trying to use the windows GUI version of BirdNET to train my own classifier in an attempt to identify individual frogs in my recordings.

The species is Allobates femoralis with a highly stereotypic call, where in previous studies colleagues and us have already shown that the calls are individually distinct, based on simple call parameters (spectral and temporal; most individual information is contained in the internote-intervalls of the 4-note call).

Gasser H, Amézquita Torres A, Hödl W (2009) Who is calling? Intraspecific call variation in the aromobatid frog Allobates femoralis. Ethology 115:596–607. doi: 10.1111/j.1439-0310.2009.01639.x

Tumulty JP, Pašukonis A, Ringler M, Forester JD, Hödl W, Bee MA (2018) Brilliant-thighed poison frogs do not use acoustic identity information to treat territorial neighbours as dear enemies. Anim Behav 141:203–220. doi: 10.1016/j.anbehav.2018.05.008

Before jumping to field recordings, I first want to investigate the general usefulness and ability of BirdNET to train on and subsequently classify these individual frog calls. To this end I am using labelled recordings of 25 individuals that were recorded each in their terrariums over several weeks. I have created a training dataset of 100 recordings (call bouts, consisting of multiple 4 note calls) that I cut out using the raven band energy detector. From these recordings, I have isolated a total of 11083 calls, coming from theses 25 individuals. Each call/training recording, consisting of 4 notes (upward sweeps ~2-4 kHz), is between 0.4-0.5 seconds long. The file format is 16 kHz mono, 16 bit WAV – resampled in Audacity from 48 kHz mono, 24 bit original recordings.

When trying to run the training function of the BirdNET GUI version with the default settings, the 25 labels/individuals are found in their folders, but training stops after a few epochs with the last messages:

Epoch 11/100

347/347 [==============================] - 2s 5ms/step - loss: 5.7809 - prec: 0.5430 - val_loss: 5.1439 - val_prec: 0.5214

WARNING:absl:Found untraced functions such as _update_step_xla while saving (showing 1 of 1). These functions will not be directly callable after loading.

...Done. Best top-1 precision: 0.5511953234672546

When changing the training parameters (epochs, learning rate, hidden units), the number of epochs until the process stops is varying slightly (9-17), but the final messages remains the same.

A *.tflite classifier file is getting written into the assigned folder and a precision plot is shown in the GUI. However, when I try to use this classifier to analyse my full recordings, I get the following error message for every single file (same audio format) to be analysed:

Error: Cannot analyze audio file D:\To_Classify\frog01_bout020882.wav.

Error: Cannot analyze audio file D:\To_Classify\frog01_bout020899.wav.

Etc.

To verify if the audio format is correctly processed, I tried to run the full BirdNET classifier with all species against my recordings. Then I receive no error messages and my frogs, all individuals, get classified as “Spring Peepers” (if I remember right).

Do you have any idea what might be the issue here that the training apparently stops prematurely and results in a broken classifier? It might be worth noting, that currently I am trying to run BirdNET on a rather outdated computer, a dual CPU XEON E5620 @ 2.4 GHz from 2009, which had issues with running CUDA (on an Nvidia GTX 2060) tensorflow for Koogu, as it lacks the ability to run certain needed, current CPU commandsets. However, as far as I understand BirdNET is not using CUDA, so this rather should not be a problem. Also the fact that the original BirdNET classifier works fine (although resulting in nonsensical identifications with my recordings for obvious reasons) rather tells me, that it is not about the hardware here.

Any help would be highly appreciated!

Kind Regards

Max

kahst commented 1 year ago

Max reached out to us via support email, I'm posting the issue here so that we can look into it.

maxolotl commented 1 year ago

Hello Stefan, thanks for moving this here!

I believe in the meantime I have found the reason for the described issue, and as it seems there was no issue with the actual training itself.

When I was training my classifier for the frog individuals, I set a classifier output directory within the folder structure I use to manage the research project that is related to these classifications. As it seems to me, the path to the location of the classifier was too long and/or contained not allowed characters (spaces, minus-symbol) which resulted in an erroneous execution of the classification, once the classifier had been trained.

I have now moved all - BirdNET-Analyzer, the training files, the files to be classified, and the classifier files and output directory into the root of my data drive, and all folder have been assigned names that are no longer than 8 characters and only contain letters, numbers and underscores.

When doing the training, although all parameters were set to defaults, so "epochs" was set to 100, the process still stopped after 23/100 epochs with the message:

"Epoch 23/100 347/347 [==============================] - 1s 3ms/step - loss: 0.0022 - prec: 0.9986 - val_loss: 0.0014 - val_prec: 1.0000 WARNING:absl:Found untraced functions such as _update_step_xla while saving (showing 1 of 1). These functions will not be directly callable after loading. ...Done. Best top-1 precision: 0.9995489120483398"

image

...the stated precision of 0.9995 and the shown precision graph indicate to me, that the classifier arrived at an extraordinary level of separation with the training data - right? Then the classification with the new, short paths ran smooth, with all files getting properly analysed. So probably check in the code which folder structures and names/characters are allowed to not impede classifier training and execution, and then add the corresponding recommendation/warning in the documentation.

On the down side - despite the apparently very high precision in the training, this very first classifier run now yielded absolutely dissatisfying results with the labeled test data - coming from the same 25 frogs that the classifier had been trained on (on other/independent recordings). Almost none of the test recordings were classified correctly - frog calls were recognized within the recordings, but they were by ~98% assigned to the wrong individual, with a very strong overrepresentation of only 2 individuals. Do you have any idea how to proceed from here - particularly the situation of a very high training precision, but abysmal classification results?

Cheers

Max