kitzeslab / opensoundscape

Open source, scalable software for the analysis of bioacoustic recordings
http://opensoundscape.org
MIT License
134 stars 14 forks source link

Many overlap_fractions don't produce results #945

Open rhine3 opened 7 months ago

rhine3 commented 7 months ago

When calling m.predict(..., overlap_fraction=0.4), certain values of overlap_fraction raise a ValueError due to dimension mismatch. The dimension is one less than it should be in my testing. Probably a rounding error somewhere?

This is while running BirdNET from the model zoo, e.g. m.predict(["../../data/wildtrax_oven/recordings_alloven_wav/100032.wav"], overlap_fraction=0.66) # returns dataframe of per-class scores

Fractions that work: 0.24, 0.25, 0.5, 0.75

Fractions that don't work: 0.1, 0.66, 0.9

Full traceback:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[21], line 1
----> 1 m.predict(["../../data/wildtrax_oven/recordings_alloven_wav/100032.wav"], overlap_fraction=0.66)

File /bgfs/jkitzes/ter38/.conda/envs/scc/lib/python3.9/site-packages/opensoundscape/ml/cnn.py:231, in BaseClassifier.predict(self, samples, batch_size, num_workers, activation_layer, split_files_into_clips, overlap_fraction, final_clip, bypass_augmentations, invalid_samples_log, raise_errors, wandb_session, return_invalid_samples, progress_bar, **kwargs)
    220     wandb_session.log(
    221         {
    222             "Samples / Preprocessed samples": wandb_table(
   (...)
    226         }
    227     )
    229 ### Prediction/Inference ###
    230 # iterate dataloader and run inference (forward pass) to generate scores
--> 231 pred_scores = self.__call__(dataloader, wandb_session, progress_bar)
    233 ### Apply activation layer ### #TODO: test speed vs. doing it in __call__ on batches
    234 pred_scores = apply_activation_layer(pred_scores, activation_layer)

File ~/.cache/torch/hub/kitzeslab_bioacoustics-model-zoo_main/bioacoustics_model_zoo/birdnet.py:123, in BirdNET.__call__(self, dataloader, return_embeddings, return_logits, **kwargs)
    120 for batch in tqdm(dataloader):
    121     for audio in batch:  # no batching, one by one?
    122         # using chirp repo code here:
--> 123         self.network.set_tensor(
    124             input_details["index"], np.float32(audio)[np.newaxis, :]
    125         )
    126         self.network.invoke()
    127         logits.extend(self.network.get_tensor(output_details["index"]))

File /bgfs/jkitzes/ter38/.conda/envs/scc/lib/python3.9/site-packages/tensorflow/lite/python/interpreter.py:720, in Interpreter.set_tensor(self, tensor_index, value)
    704 def set_tensor(self, tensor_index, value):
    705   """Sets the value of the input tensor.
    706 
    707   Note this copies data in `value`.
   (...)
    718     ValueError: If the interpreter could not set the tensor.
    719   """
--> 720   self._interpreter.SetTensor(tensor_index, value)

ValueError: Cannot set tensor: Dimension mismatch. Got 143999 but expected 144000 for dimension 1 of input 0.
sammlapp commented 7 months ago

This probably stems from a rounding or floating point issue when loading audio in Audio.from_file https://github.com/kitzeslab/opensoundscape/blob/a04ef97d2edd9d882401763a4c4e97f425ed9b00/opensoundscape/audio.py#L307-L308

rhine3 commented 6 months ago

A quick patch for this that Sam and I worked out is to modify the preprocessor of the loaded model, for example:

from opensoundscape.preprocess.actions import AudioTrim
birdnet = torch.hub.load('kitzeslab/bioacoustics-model-zoo', 'BirdNET', trust_repo=True)

pre = birdnet.preprocessor
pre.insert_action(
    action_index = 'extend_audio',
    after_key = 'load_audio',
    action = AudioTrim(extend=True)
)