mmorise / World

A high-quality speech analysis, manipulation and synthesis system
http://www.kisc.meiji.ac.jp/~mmorise/world/english
Other
1.17k stars 251 forks source link

Added isnan() check for tentative_f0 in GetTentativeF0 #85

Closed renerocksai closed 4 years ago

renerocksai commented 5 years ago

Thus rejecting nan/-nan tentative_f0 values. Got these on large (50s) WAVs of synthesized speech.

mmorise commented 5 years ago

Sorry, I can't check the program until Thursday. I will do this work this weekend. Since, it is a little strange that nan/-nan values are observed, I'd like to check the cause.

renerocksai commented 5 years ago

It might help that I used the analysis example tool from merlin. Feeding it a 50s WAV of synthesized speech containing "stuttering" caused analysis to dump core.

Further analysis (no pun intended) revealed, that using the current analysis tool from examples/analysis_synthesis, no core is dumped with or without isnan().

However, in any case when using the current analysis tool on bespoke WAV, then SPTK's x2x tool compains when converting the .lf0 file:

x2x : error: input data is over the range of type 'float'!

I don't know what to make of that but one could argue, this is a Merlin specific problem, too, since the use of analysis in combination with x2x is part of Merlin's scripts: extract_features_for_merlin.sh.

So, potentially, -nan values will not occur with the current version, however, that over the range of float error rises suspicions.

I conclude that the isnan() check better goes into Merlin's repository which contains an old copy of WORLD.

mmorise commented 5 years ago

Thank you for your information. I think that the duration of the waveform does not cause the error.

We can observe the error in x2x when the absolute amplitude is over FLT_MAX. However, if this is the cause of error, we should check the amplitude of waveform before analyzing. Latest version has another error checker, and it seems that the isnan() check is redundant. If you have an example of .wav file that causes the error, please give me it. I'll check it and debug the program with your idea if needed.

renerocksai commented 5 years ago

Thanks for looking into this and the additional information. I didn't mean the duration per se, I just wanted to give an indication about some properties of the waveform; more precisely it is 1 German sentence from a children's book of synthesized speech, where the last part of the sentence unintentionally gets repeated a lot, until it's 50s long. By mere listening to it, I haven't found anything weird regarding the amplitude.

I have uploaded the file, you can download it from here: 0013.zip

However, plotting the waveform revealed (I used TwistedWave Online Audio Editor since I am on a chromebook) that it is maxed out at 7397.xx ms:

Screenshot1

Screenshot1


Screenshot2

Screenshot 2019-08-26 at 16 35 55

This suggests that the amplitude exceeds FLT_MAX, and hence an amplitude check would be a good idea.

mmorise commented 5 years ago

Thank you for your information and the attached example.

I used the test program test.cpp in the latest version, but I could not confirm the error. If you changed the option, please give me the setting for tracing the error.