Audio buffer is not finite everywhere

William-N-Havard commented 2 years ago

I sometimes run into issues such as the following. I don't really understand why as upon listening to the file everything looks normal and the VTC & VCM work perfectly on it.

  Traceback (most recent call last):
    File "/scratch2/whavard/PACKAGES/ALICE/SylNet/run_SylNet.py", line 104, in <module>
      X[i] = np.transpose(20*np.log10(librosa.feature.melspectrogram(y=y, sr=Fs, n_mels=24, n_fft=w_l, hop_length=w_h)))
    File "/scratch2/whavard/.conda/envs/ALICE/lib/python3.6/site-packages/librosa/feature/spectral.py", line 2004, in melspectrogram
      pad_mode=pad_mode,
    File "/scratch2/whavard/.conda/envs/ALICE/lib/python.6/site-packages/librosa/core/spectrum.py", line 2519, in _spectrogram
      pad_mode=pad_mode,
    File "/scratch2/whavard/.conda/envs/ALICE/lib/python3.6/site-packages/librosa/core/spectrum.py", line 217, in stft
      util.valid_audio(y)
    File "/scratch2/whavard/.conda/envs/ALICE/lib/python3.6/site-packages/librosa/util/utils.py", line 310, in valid_audio
      raise ParameterError("Audio buffer is not finite everywhere")
  librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere

orasanen commented 2 years ago

That's strange, never seen that before and not sure how to reproduce it without the data. I did some googling and at least there was some suggestion that Librosa might throw that kind of error for NaN inputs. Could you check if the y in run_SylNet (the read waveform) is all finite values, i.e., no NaNs or Infs? Another thing to try, which should not affect the results, is to add some very small white noise floor to the y (i.e., just adding a vector of very small random numbers with zero mean)? This might help in the case the signal has exact zeros at signal onset/offset, which sometimes throws feature extractors off.

William-N-Havard commented 2 years ago

The VTC (or rather pyannote-audio) also uses librosa so I'm not sure why it only occurs when running ALICE. But indeed, there are some np.inf in the audio files that raise the exception. I'll try adding some noise and see how it goes!

orasanen commented 2 years ago

Linked this issue also on SylNet repo side.

orasanen commented 2 years ago

Another thing that is also perhaps possible: VTC is used as a front-end for SylNet (and other feature extraction) to split the input long-form data into "utterances". So, if VTC produces a segment that is not sufficiently long for Librosa feature extractor, that might cause an error. If this is the case, I could add some minimum duration threshold for the data splitting stage.

orasanen / ALICE

Audio buffer is not finite everywhere #24