nicknochnack / DeepAudioClassification

Audio classification using Tensorflow
102 stars 78 forks source link

stft parameters #1

Open siriusmehta opened 2 years ago

siriusmehta commented 2 years ago

Hi Nick,

Could you please throw some light on how did we calculate the values 320 for frame_length and 32 for frame_step in the line below:

spectrogram = tf.signal.stft(wav, frame_length=320, frame_step=32) ?

Thanks

shria2003 commented 1 year ago

wav: The input waveform ,a 1-D tensor representing a time-domain audio signal. frame_length: The length (in samples) of the STFT analysis window. This determines the number of samples included in each frame of the STFT. A longer frame_length provides better frequency resolution but reduces time resolution. A common value is 320. frame_step: The number of samples between the starting points of consecutive frames. This determines the overlap between adjacent frames. A smaller frame_step provides higher time resolution but may increase computational complexity. A common value is 32, which corresponds to 90% overlap between frames. The tf.signal.stft function returns a complex-valued tensor representing the STFT of the input waveform.

Each element of the STFT tensor corresponds (has info. of) to the complex magnitude of a frequency component at a particular time and frequency bin.

which means you can perform various audio processing tasks such as spectrogram visualization, audio synthesis, or feature extraction for tasks like speech or music recognition.