iver56 / audiomentations

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.
https://iver56.github.io/audiomentations/
MIT License
1.83k stars 187 forks source link

Speed table for audiomentations #190

Open ZFTurbo opened 2 years ago

ZFTurbo commented 2 years ago

I created small code to test speed of augmentation. I made it for myself but I think it will be useful to have it in repository somewhere.

Aug: AddGaussianNoise Time: 2.23 sec Per sample: 0.022290 sec
Aug: AddGaussianSNR Time: 2.58 sec Per sample: 0.025806 sec
Aug: ApplyImpulseResponse Time: 4.13 sec Per sample: 0.041310 sec
Aug: BandPassFilter Time: 1.02 sec Per sample: 0.010221 sec
Aug: BandStopFilter Time: 1.01 sec Per sample: 0.010077 sec
Aug: HighPassFilter Time: 0.92 sec Per sample: 0.009171 sec
Aug: HighShelfFilter Time: 0.85 sec Per sample: 0.008480 sec
Aug: LowPassFilter Time: 0.91 sec Per sample: 0.009150 sec
Aug: LowShelfFilter Time: 0.85 sec Per sample: 0.008496 sec
Aug: PeakingFilter Time: 0.85 sec Per sample: 0.008530 sec
Aug: ClippingDistortion Time: 1.47 sec Per sample: 0.014670 sec
Aug: GainTransition Time: 0.42 sec Per sample: 0.004211 sec
Aug: Mp3Compression Time: 38.12 sec Per sample: 0.381207 sec
Aug: LoudnessNormalization Time: 3.23 sec Per sample: 0.032335 sec
Aug: PitchShift Time: 70.60 sec Per sample: 0.705962 sec
Aug: PolarityInversion Time: 0.10 sec Per sample: 0.001050 sec
Aug: Resample Time: 26.85 sec Per sample: 0.268525 sec
Aug: Reverse Time: 0.00 sec Per sample: 0.000010 sec
Aug: RoomSimulator Time: 31.89 sec Per sample: 0.318857 sec
Aug: SevenBandParametricEQ Time: 5.95 sec Per sample: 0.059508 sec
Aug: Shift Time: 0.10 sec Per sample: 0.001000 sec
Aug: TanhDistortion Time: 1.80 sec Per sample: 0.018049 sec
Aug: TimeMask Time: 0.13 sec Per sample: 0.001340 sec
Aug: TimeStretch Time: 40.79 sec Per sample: 0.407884 sec

from audiomentations import *
import tqdm
import soundfile as sf

def check_audiomentations_speed(
        path_to_wav_files,
        maximum_files=10,
        sample_rate=44100,
        save_to_check=False,
):
    wav_paths = glob.glob(path_to_wav_files + '/*.wav')[:maximum_files]
    data = []
    for i, wav_path in tqdm.tqdm(enumerate(wav_paths)):
        audio1, _ = librosa.load(wav_path, sr=44100, mono=False)
        data.append(audio1)

        if save_to_check:
            audio1 = audio1.transpose()
            out_folder = CACHE_PATH + 'original' + '/'
            if not os.path.isdir(out_folder):
                os.mkdir(out_folder)
            save_path = out_folder + os.path.basename(wav_paths[i])
            sf.write(save_path, audio1, samplerate=sample_rate, subtype='float')

    full_list_to_check = [
        AddGaussianNoise(p=1.0, min_amplitude=0.001, max_amplitude=0.025),
        AddGaussianSNR(p=1.0, min_snr_in_db=5, max_snr_in_db=40.0),
        ApplyImpulseResponse(p=1.0, ir_path=INPUT_PATH + 'ir_data/', lru_cache_size=500, leave_length_unchanged=True),

        BandPassFilter(p=1.0, min_center_freq=200.0, max_center_freq=4000.0, min_bandwidth_fraction=0.5, max_bandwidth_fraction=1.99, min_rolloff=12, max_rolloff=24,),
        BandStopFilter(p=1.0, min_center_freq=200.0, max_center_freq=4000.0, min_bandwidth_fraction=0.5, max_bandwidth_fraction=1.99, min_rolloff=12, max_rolloff=24,),
        HighPassFilter(p=1.0, min_cutoff_freq=20, max_cutoff_freq=2400, min_rolloff=12, max_rolloff=24, zero_phase=False, ),
        HighShelfFilter(p=1.0, min_center_freq=300.0, max_center_freq=7500.0, min_gain_db=-18.0, max_gain_db=18.0, min_q=0.1, max_q=0.999,),
        LowPassFilter(p=1.0,  min_cutoff_freq=150, max_cutoff_freq=7500, min_rolloff=12, max_rolloff=24, zero_phase=False,),
        LowShelfFilter(p=1.0, min_center_freq=50.0, max_center_freq=4000.0, min_gain_db=-18.0, max_gain_db=18.0, min_q=0.1, max_q=0.999,),
        PeakingFilter(p=1.0, min_center_freq=50.0, max_center_freq=7500.0, min_gain_db=-24, max_gain_db=24, min_q=0.5, max_q=5.0, ),

        ClippingDistortion(p=1.0, min_percentile_threshold=0, max_percentile_threshold=40),
        GainTransition(p=1.0, min_gain_in_db=-24.0,  max_gain_in_db=6.0, min_duration=0.2, max_duration=6.0,),
        Mp3Compression(p=1.0, min_bitrate=8, max_bitrate=128, backend="pydub",),
        LoudnessNormalization(p=1.0, min_lufs_in_db=-31, max_lufs_in_db=-13),
        PitchShift(p=1.0, min_semitones=-4, max_semitones=4),
        PolarityInversion(p=1.0, ),
        Resample(p=1.0, min_sample_rate=8000, max_sample_rate=44100),
        Reverse(p=1.0, ),
        RoomSimulator(p=1.0, ),
        SevenBandParametricEQ(p=1.0, min_gain_db=-12.0, max_gain_db=12.0,),
        Shift(p=1.0, min_fraction=-0.5, max_fraction=0.5, rollover=True, fade=False, fade_duration=0.01,),
        TanhDistortion(p=1.0, min_distortion=0.01, max_distortion=0.7),
        TimeMask(p=1.0, min_band_part=0.0, max_band_part=0.1, fade=False),
        TimeStretch(p=1.0, min_rate=0.8, max_rate=1.25, leave_length_unchanged=True, ),
    ]

    # Not available
    # AddShortNoises(p=1.0),
    # AirAbsorption()

    for f in full_list_to_check:
        name = f.__class__.__name__
        aug1 = Compose([
            f,
        ], p=1.0)

        start_time = time.time()
        for i, wav in enumerate(data):
            try:
                audio1 = aug1(samples=wav, sample_rate=sample_rate)
            except Exception as e:
                print('Augmentation error: {}'.format(str(e)))
                continue
            if save_to_check:
                audio1 = audio1.transpose()
                out_folder = CACHE_PATH + name + '/'
                if not os.path.isdir(out_folder):
                    os.mkdir(out_folder)
                save_path = out_folder + os.path.basename(wav_paths[i])
                sf.write(save_path, audio1, samplerate=sample_rate, subtype='float')

        delta = time.time() - start_time
        print('Aug: {} Time: {:.2f} sec Per sample: {:.6f} sec'.format(name, delta, delta / len(data)))

if __name__ == '__main__':
    path_to_wav_files = INPUT_PATH + 'train_wav/'
    maximum_files = 100
    sample_rate = 44100
    save_to_check = False

    check_audiomentations_speed(
        path_to_wav_files,
        maximum_files,
        sample_rate,
        save_to_check,
    )
iver56 commented 2 years ago

Measuring execution times is definitely relevant :) There's also code for doing that in the demo script, which will output something similar:

AddBackgroundNoiseRelative       0.056 s (std: 0.064 s)
AddBackgroundNoiseAbsolute       0.054 s (std: 0.063 s)
AddBackgroundNoiseWithTransform  0.055 s (std: 0.064 s)
AddGaussianNoise                 0.011 s (std: 0.000 s)
AddGaussianSNR                   0.012 s (std: 0.000 s)
ApplyImpulseResponseWithTail     0.030 s
ApplyImpulseResponseLeaveLengthUnchanged 0.029 s
AddShortNoisesAbsolute           0.019 s (std: 0.009 s)
AddShortNoisesRelative           0.018 s (std: 0.012 s)
AddShortNoisesWithSignalGain     0.041 s (std: 0.018 s)
AddShortNoisesWithNoiseTransform 4.793 s (std: 2.357 s)
BandPassFilter                   0.006 s (std: 0.001 s)
BandStopFilter                   0.006 s (std: 0.001 s)
ClippingDistortion               0.007 s (std: 0.000 s)
FrequencyMask                    0.008 s (std: 0.000 s)
Gain                             0.001 s (std: 0.000 s)
GainTransition                   0.004 s (std: 0.002 s)
HighPassFilter                   0.005 s (std: 0.000 s)
HighShelfFilter                  0.004 s (std: 0.000 s)
LowPassFilter                    0.005 s (std: 0.001 s)
LowShelfFilter                   0.004 s (std: 0.000 s)
PitchShift                       0.475 s (std: 0.052 s)
LoudnessNormalization            0.018 s (std: 0.002 s)
Mp3CompressionLameenc            3.802 s (std: 0.447 s)
Mp3CompressionPydub              4.390 s (std: 0.408 s)
Normalize                        0.001 s
PaddingSilenceEnd                0.001 s (std: 0.000 s)
PaddingWrapEnd                   0.001 s (std: 0.000 s)
PaddingReflectEnd                0.001 s (std: 0.000 s)
PaddingSilenceStart              0.001 s (std: 0.000 s)
PaddingWrapStart                 0.001 s (std: 0.000 s)
PeakingFilter                    0.006 s (std: 0.000 s)
PolarityInversion                0.001 s
Resample                         0.376 s (std: 0.041 s)
Reverse                          0.000 s
RoomSimulator                    0.392 s (std: 0.143 s)
SevenBandParametricEQ            0.033 s (std: 0.002 s)
ShiftWithoutFade                 0.001 s (std: 0.000 s)
ShiftWithShortFade               0.001 s (std: 0.000 s)
ShiftWithoutRolloverWithLongFade 0.001 s (std: 0.000 s)
TanhDistortion                   0.012 s (std: 0.002 s)
TimeMask                         0.001 s (std: 0.000 s)
TimeStretch                      0.218 s (std: 0.023 s)
Trim                             0.006 s
BigCompose                       0.314 s (std: 0.316 s)
AirAbsorption                    0.049 s (std: 0.003 s)

I think if we make a plot (with logarithmic exec time axis) it can be included in the readme so people can get an idea of how quick/slow the transforms are

ZFTurbo commented 2 years ago

Yes, I think this info is very useful. May be it can be independent page with results but with link on it from main page.

Also I propose to move Changelog from main page to some other file like "Changes.md" with adding link to it.

I noticed that PitchShift and TimeStretch is very useful but very slow... Need to think how speed up them.

iver56 commented 2 years ago

According to Spijkervet, the pitch shifting implementation in WavAugment is fast

https://twitter.com/JanneSpijkervet/status/1292411014584180736

There's also a pitch shift transform in https://github.com/asteroid-team/torch-audiomentations but that isn't very fast

Then there's https://github.com/maxrmorrison/clpcnet which is good, but only works for speech and only with a 16 kHz sample rate