Open sammlapp opened 6 months ago
Also: request from https://github.com/kitzeslab/opensoundscape/issues/947
we might want to use an average of a sliding window stft to compute the spectrum. IE we could call https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.ShortTimeFFT.html#scipy.signal.ShortTimeFFT then take a time average. In this stfft implementation, passing fftmode='onesided2X' produces the output requested in this issue. We can make that the default behavior. For choosing a default of power vs magnitude, we can use magnitude as default, to match default in Spectrogram.from_audio().
We can expose all arguments to the user (kwargs), including whether to get power spectral density (FFT2 / N) or magnitude scaling (FFT/N).
Defaults for scipy.signal.ShortTimeFFT.spectrogram will produce slightly different time bins than scipy.signal.spectrogram, I'm not sure if we should just go with the defaults. Also, our usage of default mode
arg (which defaults to 'psd') with scaling=spectrum
seems to be an "invalid" combination that is not supported by scipy.signal.ShortTimeFFT.spectrogram
. Lack of clear documentation of the old scipy.signal.spectrogram made it difficult to use the function correctly.
However, default parameters results in the 'extent' of the spectrogram reaching beyond the original signal length, which is awkward. For instance, if the audio signal is 1 second and the spectrogram has non-overlapping bins of size 0.1 seconds, there will be 11 bins and the entire spectrogram will represent 1.1 seconds of audio. This suggests using the old behavior for time binning instead.
We could also consider switching to torchaudio.transforms.spectrogram, which supports GPU acceleration. However, torchaudio API currently doesn't have a way to get frequency and time arrays corresponding to the spectrogram.
Following this thread we can now match scipy.signal.spectrogram
and scipy.signal.ShortTimeFFT
using .stft_detrend()
for "magnitude" scaling with detrend="constant". However, the combination of using scaling='spectrum', mode='magnitude'
is not reproducible in the newer API and seems to not be a sensible combination of settings.
import scipy
import numpy as np
from matplotlib import pyplot as plt
np.random.seed(0)
x=np.random.random(1000)
noverlap=50
fs=1000
nperseg=100
N=len(x)
# legacy spectrogram applies detrend='constant'
f1,t1, s1 = scipy.signal.spectrogram(x,fs=fs, scaling='spectrum', nperseg=nperseg, noverlap=noverlap, mode='magnitude')
# ShortTimeFFT.spectrogram internally calls stft_detrend, but has different time-windowing behavior at the edges
stfft = scipy.signal.ShortTimeFFT.from_window(('tukey', 0.25), fs=1000, nperseg=100, noverlap=50, fft_mode='onesided',
scale_to='magnitude', phase_shift=None)
s2 = np.sqrt(stfft.spectrogram(x, detr='constant'))
# to exactly match the legacy spectrogram, we can customize the time-binning and use stft_detrend with detr='constant'
SFT = scipy.signal.ShortTimeFFT.from_window(('tukey',.25), fs=fs, nperseg=nperseg, noverlap=noverlap,
fft_mode='onesided',
scale_to='magnitude', phase_shift=None)
Sz3 = SFT.stft_detrend(x, detr='constant', p0=0, p1=(N-noverlap)//SFT.hop, k_offset=nperseg//2)
s3 = np.sqrt(Sz3.real**2 + Sz3.imag**2)
print(f"Are s1 and s3 numerically similar? {np.allclose(s1, s3)}")
print(f'means: {s1.mean()}, {s2.mean()}, {s3.mean()}')
print(f'shapes: {s1.shape}, {s2.shape}, {s3.shape}')
In summary, we will only be able to support exact reproduction of the old behavior if we continue using scipy.signal.spectrogram
, and if we choose new behavior we have to choose between time-windowing behaviors: legacy API creates bins covering the same amount of time as the original signal, whereas new API by default creates time bins extending beyond the edges (first bin is centered on t=0)
supposed to use shorttimefft.spectrogram now