Closed cnlinxi closed 1 year ago
This is an operation to interpolate F0. Since the original F0 contains the classification of voiced segment (F0 > 0) and unvoiced segment (F0 == 0), direct prediction of the original F0 can result in the classification error of voiced segment and unvoiced segment. Interpolation of F0 can reduce the classification errors of voiced and unvoiced segments.
No interpolation is also possible, but the model may be more stable when interpolating F0.
Thank you for your reply
good work.
I see the following upsampling f0 operation in dataset.py:
why do this? The naturalness of synthesized voice will decrease if I donot do this?
Thanks for your answer.