Closed sammlapp closed 1 year ago
These will end up in bioacoustics-model-zoo rather than in opensoundscape, but the #835 PR provides a parent class BaseClassifier that can be sub-classed for access to .predict() functionality matching the current CNN class. These models will support only inference.
This is now supported via the bioacoustics-model-zoo.
Note that these models are from TensorFlow Hub and require a python environment with TensorFlow.
Copying from the readme, here is an example with Perch:
import torch
model=torch.hub.load('kitzeslab/bioacoustics_model_zoo', 'Perch')
predictions = model.predict(['test.wav']) #predict on the model's classes
embeddings = model.generate_embeddings(['test.wav']) #generate embeddings on each 5 sec of audio
BirdNET:
import torch
m=torch.hub.load('kitzeslab/bioacoustics-model-zoo', 'BirdNET')
m.predict(['test.wav']) # returns dataframe of per-class scores
m.generate_embeddings(['test.wav']) # returns dataframe of embeddings
and YAMNet:
import torch
m=torch.hub.load('kitzeslab/bioacoustics-model-zoo', 'YAMNet')
m.predict(['test.wav']) # returns dataframe of per-class scores
m.generate_embeddings(['test.wav']) # returns dataframe of embeddings
Perch, YamNet, and BirdNET offer three ways of creating feature embeddings from audio. Perch (xeno canto) and BirdNET (xeno canto, macualy, and more) are based on supervised classifiers while YAMNet is based on self-supervised learning from Youtube audio set.
Perch is available on TensorFlow Hub and is used in the chirp repo
YamNet tutorial, and TensorFlow Hub