Closed noobar closed 4 years ago
Harvest attempts to reduce the unvoiced frame and give it a reliable F0, so this result is reasonable as expected. If you require accurate boundaries between voiced and unvoiced sections, another algorithm in VAD (voice activity detection) would be useful.
Harvest is for high-quality speech analysis/synthesis systems and SPSS (statistical parametric speech synthesis). Since the continuous F0 modeling used in SPSS gives a certain F0 to the unvoiced section, the F0 contour estimated by Harvest is better than that by Dio for this purpose.
Sorry for late reply. I understand it is expected result. As you suggest, I need some other tools. Thank you for your explanation :pray:
I tested F0 estimation both on
examples/analysis_synthesis
(DIO) andexamples/parameter_io
(Harvest), then I found the latter result shows F0 even in the silent section. The following figure shows the results of estimation fromtest/vaiueo2d.wav
, the recorded sentence is "aiueo".I read #35 and got to know below:
But, in this case, I believe the F0 above is not in the unvoiced section because the first voice is "a".
Is this expected behavior? Should I do any pre/postprocess?
Regards