kitzeslab / bioacoustics-model-zoo

Pre-trained models for bioacoustic classification tasks
30 stars 5 forks source link

HawkEars predict() has scores from audio clips that did not exist #16

Open sammlapp opened 5 days ago

sammlapp commented 5 days ago

predict() and embed() outputs for rows where the audio file did not exist contain copies of scores from some other clip. When batching samples we copy other clips into the place of missing clips to avoid errors with N/A, but they need to be replaced with NA in the final score df!

As a temporary workaround, please replace scores with NaN for any rows in the output where the start_time is nan:

hawkears=opensoundscape.ml.bioacoustics_model_zoo.Hawkears()
preds = hawkears.predict(...)
nan_mask = preds.index.get_level_values('start_time').isna()
preds[nan_mask]=np.nan

# same for embeddings:
emb = hawkears.predict(...)
nan_mask = emb.index.get_level_values('start_time').isna()
emb[nan_mask]=np.nan

This is not occurring with the CNN() class, so seems specific to something in the model zoo

sammlapp commented 5 days ago

cannot reproduce... hmmm not sure how it happened for me the first time.