iver56 / audiomentations

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.
https://iver56.github.io/audiomentations/
MIT License
1.87k stars 190 forks source link

Fix bug: FrequencyMask sometimes outputs NaN values #36

Closed iver56 closed 4 years ago

iver56 commented 4 years ago

The bug can be reproduced with f.ex. these parameters:

should_apply=True,
bandwidth=600,
freq_start=172,
iver56 commented 4 years ago

I think this is related to an unstable filter, where values explode exponentially and eventually end up outside the valid range of float32

lukewys commented 4 years ago

Dear @iver56 , I encountered the same problem when using FrequencyMask. I am trying to apply data augmentation before converting it to Mel-spectrogram. Sometimes it would raise: librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere.

I think it is because of NaN presence in audio array.

Is there any way to bypass the problem? Thanks in advance!

iver56 commented 4 years ago

Here are some options: 1) Check for NaN values in the result, and discard or redo any augmentations that failed 2) Perform frequency masking in spectrogram space instead of applying it to the waveform signal. See also https://github.com/zcaceres/spec_augment 3) Try to fix the underlying bug in audiomentations (I am happy to receive and review pull requests)

Here is a code example for option 1:

audio = ...
augmenter = Compose([ ... ])

perturbed_audio = augmenter(audio, sample_rate)

if np.isnan(perturbed_audio).any():
    pass
    # insert code to discard the result, because it contains invalid values
lukewys commented 4 years ago

Thanks very much! Since I eventually want a spectrogram so I decided to do the masking in the spectral domain. Thanks again!

kvilouras commented 4 years ago

@iver56 sosfilt fixes this issue. This way, NaN values are replaced by 0. Only the following functions need to be replaced.

from scipy.signal import sosfilt

def __butter_bandstop(self, lowcut, highcut, fs, order=5):
        nyq = 0.5 * fs
        low = lowcut / nyq
        high = highcut / nyq
        sos = butter(order, [low, high], btype='bandstop', output='sos')
        return sos

    def __butter_bandstop_filter(self, data, lowcut, highcut, fs, order=5):
        sos = self.__butter_bandstop(lowcut, highcut, fs, order=order)
        y = sosfilt(sos, data).astype(np.float32)
        return y
iver56 commented 4 years ago

Sounds good! Would you like to make a pull request?

iver56 commented 4 years ago

I added a failing test: https://github.com/iver56/audiomentations/commit/defce915f796394aef3e51e72bff18da8b63baff Fixed by kvilouras: https://github.com/iver56/audiomentations/pull/40 The build is green after merge: https://circleci.com/gh/iver56/audiomentations/106