predict() and embed() outputs for rows where the audio file did not exist contain copies of scores from some other clip. When batching samples we copy other clips into the place of missing clips to avoid errors with N/A, but they need to be replaced with NA in the final score df!
As a temporary workaround, please replace scores with NaN for any rows in the output where the start_time is nan:
hawkears=opensoundscape.ml.bioacoustics_model_zoo.Hawkears()
preds = hawkears.predict(...)
nan_mask = preds.index.get_level_values('start_time').isna()
preds[nan_mask]=np.nan
# same for embeddings:
emb = hawkears.predict(...)
nan_mask = emb.index.get_level_values('start_time').isna()
emb[nan_mask]=np.nan
This is not occurring with the CNN() class, so seems specific to something in the model zoo
predict() and embed() outputs for rows where the audio file did not exist contain copies of scores from some other clip. When batching samples we copy other clips into the place of missing clips to avoid errors with N/A, but they need to be replaced with NA in the final score df!
As a temporary workaround, please replace scores with NaN for any rows in the output where the start_time is nan:
This is not occurring with the CNN() class, so seems specific to something in the model zoo