measure of classification uncertainty

vocalpy / hybrid-vocal-classifier

a Python machine learning library for animal vocalizations and bioacoustics

BSD 3-Clause "New" or "Revised" License

23 stars 8 forks source link

It'd be great to have some measure of classification uncertainty to be able to post-hoc filter out dodgy syllables from analyses. One idea is to have sklearn return the classification probabilities, and then calculate some measure of disorder across this vector. In some recent runs, I've been using the entropy of the classification probabilities, and it seems to perform reasonably well.

Lowering the entropy threshold at which syllables are retained appears to progressively remove poorly classified syllables, as seen in attached plot as syllables outside the main duration/spectral centroid cloud.

pred_ent_threshold_series_class

This was the first measure that came to mind, so there might be something that performs better out there.

vocalpy / hybrid-vocal-classifier

measure of classification uncertainty #45