Open siriusmehta opened 2 years ago
wav: The input waveform ,a 1-D tensor representing a time-domain audio signal. frame_length: The length (in samples) of the STFT analysis window. This determines the number of samples included in each frame of the STFT. A longer frame_length provides better frequency resolution but reduces time resolution. A common value is 320. frame_step: The number of samples between the starting points of consecutive frames. This determines the overlap between adjacent frames. A smaller frame_step provides higher time resolution but may increase computational complexity. A common value is 32, which corresponds to 90% overlap between frames. The tf.signal.stft function returns a complex-valued tensor representing the STFT of the input waveform.
Each element of the STFT tensor corresponds (has info. of) to the complex magnitude of a frequency component at a particular time and frequency bin.
which means you can perform various audio processing tasks such as spectrogram visualization, audio synthesis, or feature extraction for tasks like speech or music recognition.
Hi Nick,
Could you please throw some light on how did we calculate the values 320 for frame_length and 32 for frame_step in the line below:
spectrogram = tf.signal.stft(wav, frame_length=320, frame_step=32) ?
Thanks