vocalpy / hybrid-vocal-classifier

a Python machine learning library for animal vocalizations and bioacoustics
http://hybrid-vocal-classifier.readthedocs.io
BSD 3-Clause "New" or "Revised" License
23 stars 8 forks source link

measure of classification uncertainty #45

Open bradleycolquitt opened 6 years ago

bradleycolquitt commented 6 years ago

It'd be great to have some measure of classification uncertainty to be able to post-hoc filter out dodgy syllables from analyses. One idea is to have sklearn return the classification probabilities, and then calculate some measure of disorder across this vector. In some recent runs, I've been using the entropy of the classification probabilities, and it seems to perform reasonably well.

Lowering the entropy threshold at which syllables are retained appears to progressively remove poorly classified syllables, as seen in attached plot as syllables outside the main duration/spectral centroid cloud.

pred_ent_threshold_series_class

This was the first measure that came to mind, so there might be something that performs better out there.

NickleDave commented 6 years ago

Very much agree that some sort of measure to filter out "not well classified" segments is needed. Can you share the code you used to generate the plot? Maybe as a Jupyter notebook?

If you are interested in contributing, I would definitely be willing to incorporate this measure e.g. in hvc.utils.metrics at least as a starting point while we figure out if there are other metrics that perform better