Harvest estimates F0 on silent section.

noobar commented 4 years ago

I tested F0 estimation both on examples/analysis_synthesis (DIO) and examples/parameter_io (Harvest), then I found the latter result shows F0 even in the silent section. The following figure shows the results of estimation from test/vaiueo2d.wav, the recorded sentence is "aiueo".

I read #35 and got to know below:

it's not a big problem that F0 is estimated on unvoiced section
the unvoiced section can be calculated from aperiodicity

But, in this case, I believe the F0 above is not in the unvoiced section because the first voice is "a".

Is this expected behavior? Should I do any pre/postprocess?

Regards

mmorise commented 4 years ago

Harvest attempts to reduce the unvoiced frame and give it a reliable F0, so this result is reasonable as expected. If you require accurate boundaries between voiced and unvoiced sections, another algorithm in VAD (voice activity detection) would be useful.

Harvest is for high-quality speech analysis/synthesis systems and SPSS (statistical parametric speech synthesis). Since the continuous F0 modeling used in SPSS gives a certain F0 to the unvoiced section, the F0 contour estimated by Harvest is better than that by Dio for this purpose.

noobar commented 4 years ago

Sorry for late reply. I understand it is expected result. As you suggest, I need some other tools. Thank you for your explanation :pray:

mmorise / World

Harvest estimates F0 on silent section. #105