nii-yamagishilab / project-CURRENNT-public

CURRENNNT codes and scripts
GNU General Public License v3.0
77 stars 11 forks source link

About f0 upsampling #3

Closed zhao2zhang closed 5 years ago

zhao2zhang commented 5 years ago

Said in the paper ‘condition module upsamples the F0 by duplicating fb to every time step within the b-th frame’ .Can you explain in detail how the upsampling here is done? What is the time step? thank you very much

TonyWangX commented 5 years ago
  1. About upsampling, see the comments below
  2. Time step denotes the waveform time step t

All the hidden features and signals in the source and filter modules have the same length as the waveform o_1:T. However, F0 and spectral features are extracted every frame and have only B frames (i.e., f_1:B). The condition module needs to upsample the f_1:B to \tilde{f}_1:T.

Upsampling is quite straightforward: just copying the value of f_b multiple times. Suppose waveform sampling rate is 16kHz, frame shift is 5ms (one frame every 5ms). Then, each f_b must be replicated for 16 * 5 = 80 times.

Upsampling in math: \tilde{f}t = f{t/80}, where t/80 is the floor division (e.g., 2/3 = 0, 4/3=1) Upsampling in picture: page 13 of https://www.slideshare.net/akiratamamori/speaker-dependent-wavenet-vocoder

(I realize that "duplicating to every time step within the b-th frame" may be misleading if we consider the overlap of the framing window. )