zhangyongmao / VISinger2

VISinger 2: High-Fidelity End-to-End Singing Voice Synthesis Enhanced by Digital Signal Processing Synthesizer
321 stars 42 forks source link

necessity of upsampling f0 during data processing #9

Closed cnlinxi closed 1 year ago

cnlinxi commented 1 year ago

good work.

I see the following upsampling f0 operation in dataset.py:


f0, _ = self.interpolate_f0(f0)

why do this? The naturalness of synthesized voice will decrease if I donot do this?

Thanks for your answer.

zhangyongmao commented 1 year ago

This is an operation to interpolate F0. Since the original F0 contains the classification of voiced segment (F0 > 0) and unvoiced segment (F0 == 0), direct prediction of the original F0 can result in the classification error of voiced segment and unvoiced segment. Interpolation of F0 can reduce the classification errors of voiced and unvoiced segments.

No interpolation is also possible, but the model may be more stable when interpolating F0.

cnlinxi commented 1 year ago

Thank you for your reply