add support for Perch and YAMNet feature embedding

sammlapp commented 1 year ago

Perch, YamNet, and BirdNET offer three ways of creating feature embeddings from audio. Perch (xeno canto) and BirdNET (xeno canto, macualy, and more) are based on supervised classifiers while YAMNet is based on self-supervised learning from Youtube audio set.

Perch is available on TensorFlow Hub and is used in the chirp repo

YamNet tutorial, and TensorFlow Hub

sammlapp commented 1 year ago

These will end up in bioacoustics-model-zoo rather than in opensoundscape, but the #835 PR provides a parent class BaseClassifier that can be sub-classed for access to .predict() functionality matching the current CNN class. These models will support only inference.

sammlapp commented 1 year ago

This is now supported via the bioacoustics-model-zoo.

Note that these models are from TensorFlow Hub and require a python environment with TensorFlow.

Copying from the readme, here is an example with Perch:

import torch
model=torch.hub.load('kitzeslab/bioacoustics_model_zoo', 'Perch')
predictions = model.predict(['test.wav']) #predict on the model's classes
embeddings = model.generate_embeddings(['test.wav']) #generate embeddings on each 5 sec of audio

BirdNET:

import torch
m=torch.hub.load('kitzeslab/bioacoustics-model-zoo', 'BirdNET')
m.predict(['test.wav']) # returns dataframe of per-class scores
m.generate_embeddings(['test.wav']) # returns dataframe of embeddings

and YAMNet:

import torch
m=torch.hub.load('kitzeslab/bioacoustics-model-zoo', 'YAMNet')
m.predict(['test.wav']) # returns dataframe of per-class scores
m.generate_embeddings(['test.wav']) # returns dataframe of embeddings

kitzeslab / opensoundscape

add support for Perch and YAMNet feature embedding #816