kitzeslab / opensoundscape

Open source, scalable software for the analysis of bioacoustic recordings
http://opensoundscape.org
MIT License
137 stars 16 forks source link

add support for Perch and YAMNet feature embedding #816

Closed sammlapp closed 1 year ago

sammlapp commented 1 year ago

Perch, YamNet, and BirdNET offer three ways of creating feature embeddings from audio. Perch (xeno canto) and BirdNET (xeno canto, macualy, and more) are based on supervised classifiers while YAMNet is based on self-supervised learning from Youtube audio set.

Perch is available on TensorFlow Hub and is used in the chirp repo

YamNet tutorial, and TensorFlow Hub

sammlapp commented 1 year ago

These will end up in bioacoustics-model-zoo rather than in opensoundscape, but the #835 PR provides a parent class BaseClassifier that can be sub-classed for access to .predict() functionality matching the current CNN class. These models will support only inference.

sammlapp commented 1 year ago

This is now supported via the bioacoustics-model-zoo.

Note that these models are from TensorFlow Hub and require a python environment with TensorFlow.

Copying from the readme, here is an example with Perch:

import torch
model=torch.hub.load('kitzeslab/bioacoustics_model_zoo', 'Perch')
predictions = model.predict(['test.wav']) #predict on the model's classes
embeddings = model.generate_embeddings(['test.wav']) #generate embeddings on each 5 sec of audio

BirdNET:

import torch
m=torch.hub.load('kitzeslab/bioacoustics-model-zoo', 'BirdNET')
m.predict(['test.wav']) # returns dataframe of per-class scores
m.generate_embeddings(['test.wav']) # returns dataframe of embeddings

and YAMNet:

import torch
m=torch.hub.load('kitzeslab/bioacoustics-model-zoo', 'YAMNet')
m.predict(['test.wav']) # returns dataframe of per-class scores
m.generate_embeddings(['test.wav']) # returns dataframe of embeddings