Open ZFTurbo opened 2 years ago
Measuring execution times is definitely relevant :) There's also code for doing that in the demo script, which will output something similar:
AddBackgroundNoiseRelative 0.056 s (std: 0.064 s)
AddBackgroundNoiseAbsolute 0.054 s (std: 0.063 s)
AddBackgroundNoiseWithTransform 0.055 s (std: 0.064 s)
AddGaussianNoise 0.011 s (std: 0.000 s)
AddGaussianSNR 0.012 s (std: 0.000 s)
ApplyImpulseResponseWithTail 0.030 s
ApplyImpulseResponseLeaveLengthUnchanged 0.029 s
AddShortNoisesAbsolute 0.019 s (std: 0.009 s)
AddShortNoisesRelative 0.018 s (std: 0.012 s)
AddShortNoisesWithSignalGain 0.041 s (std: 0.018 s)
AddShortNoisesWithNoiseTransform 4.793 s (std: 2.357 s)
BandPassFilter 0.006 s (std: 0.001 s)
BandStopFilter 0.006 s (std: 0.001 s)
ClippingDistortion 0.007 s (std: 0.000 s)
FrequencyMask 0.008 s (std: 0.000 s)
Gain 0.001 s (std: 0.000 s)
GainTransition 0.004 s (std: 0.002 s)
HighPassFilter 0.005 s (std: 0.000 s)
HighShelfFilter 0.004 s (std: 0.000 s)
LowPassFilter 0.005 s (std: 0.001 s)
LowShelfFilter 0.004 s (std: 0.000 s)
PitchShift 0.475 s (std: 0.052 s)
LoudnessNormalization 0.018 s (std: 0.002 s)
Mp3CompressionLameenc 3.802 s (std: 0.447 s)
Mp3CompressionPydub 4.390 s (std: 0.408 s)
Normalize 0.001 s
PaddingSilenceEnd 0.001 s (std: 0.000 s)
PaddingWrapEnd 0.001 s (std: 0.000 s)
PaddingReflectEnd 0.001 s (std: 0.000 s)
PaddingSilenceStart 0.001 s (std: 0.000 s)
PaddingWrapStart 0.001 s (std: 0.000 s)
PeakingFilter 0.006 s (std: 0.000 s)
PolarityInversion 0.001 s
Resample 0.376 s (std: 0.041 s)
Reverse 0.000 s
RoomSimulator 0.392 s (std: 0.143 s)
SevenBandParametricEQ 0.033 s (std: 0.002 s)
ShiftWithoutFade 0.001 s (std: 0.000 s)
ShiftWithShortFade 0.001 s (std: 0.000 s)
ShiftWithoutRolloverWithLongFade 0.001 s (std: 0.000 s)
TanhDistortion 0.012 s (std: 0.002 s)
TimeMask 0.001 s (std: 0.000 s)
TimeStretch 0.218 s (std: 0.023 s)
Trim 0.006 s
BigCompose 0.314 s (std: 0.316 s)
AirAbsorption 0.049 s (std: 0.003 s)
I think if we make a plot (with logarithmic exec time axis) it can be included in the readme so people can get an idea of how quick/slow the transforms are
Yes, I think this info is very useful. May be it can be independent page with results but with link on it from main page.
Also I propose to move Changelog from main page to some other file like "Changes.md" with adding link to it.
I noticed that PitchShift
and TimeStretch
is very useful but very slow... Need to think how speed up them.
According to Spijkervet, the pitch shifting implementation in WavAugment is fast
https://twitter.com/JanneSpijkervet/status/1292411014584180736
There's also a pitch shift transform in https://github.com/asteroid-team/torch-audiomentations but that isn't very fast
Then there's https://github.com/maxrmorrison/clpcnet which is good, but only works for speech and only with a 16 kHz sample rate
I created small code to test speed of augmentation. I made it for myself but I think it will be useful to have it in repository somewhere.