mmorise / World

A high-quality speech analysis, manipulation and synthesis system
http://www.kisc.meiji.ac.jp/~mmorise/world/english
Other
1.15k stars 249 forks source link

Harvest estimates F0 on silent section. #105

Closed noobar closed 3 years ago

noobar commented 3 years ago

I tested F0 estimation both on examples/analysis_synthesis (DIO) and examples/parameter_io (Harvest), then I found the latter result shows F0 even in the silent section. The following figure shows the results of estimation from test/vaiueo2d.wav, the recorded sentence is "aiueo".

I read #35 and got to know below:

But, in this case, I believe the F0 above is not in the unvoiced section because the first voice is "a".

Is this expected behavior? Should I do any pre/postprocess?

Regards

mmorise commented 3 years ago

Harvest attempts to reduce the unvoiced frame and give it a reliable F0, so this result is reasonable as expected. If you require accurate boundaries between voiced and unvoiced sections, another algorithm in VAD (voice activity detection) would be useful.

Harvest is for high-quality speech analysis/synthesis systems and SPSS (statistical parametric speech synthesis). Since the continuous F0 modeling used in SPSS gives a certain F0 to the unvoiced section, the F0 contour estimated by Harvest is better than that by Dio for this purpose.

noobar commented 3 years ago

Sorry for late reply. I understand it is expected result. As you suggest, I need some other tools. Thank you for your explanation :pray: