mmorise / World

A high-quality speech analysis, manipulation and synthesis system
http://www.kisc.meiji.ac.jp/~mmorise/world/english
Other
1.18k stars 255 forks source link

Reason for empty analysis files? #36

Closed dreamk73 closed 7 years ago

dreamk73 commented 7 years ago

I have attached a zip file with two waveforms from the same acoustic database. The file sn001_sent006.wav is totally fine and I get analysis files. The file sn001_sent007.wav does not look any different for me, but the analysis fails and output files are empty. Any idea what is going on? I have tried with changing the lower and upper boundaries for the F0 but there is no effect. I am using the extract_features_for_merlin.sh script in Merlin.

ex.zip

mmorise commented 7 years ago

I'm afraid that I could not confirm the result that you pointed out. I could analyze/synthesize both files (but the sound quality was not good...). I could also obtain the output files.

If you used the latest version of WORLD, please give me the information (e.g. example program for analysis/synthesis and the commands). Since I provide several examples, I'd like to know the program you used.

Best regards,

dreamk73 commented 7 years ago

I don't understand why the sound quality would be not good. I had 22kHz files which are studio quality. I used Festival/speech tools ch_wave to create 16kHz versions.

I used the latest version of WORLD in the Merlin repository changing the floorF0 and ceilF0 to 71 Hz and 150 Hz in test/world/constantnumbers.h.

mmorise commented 7 years ago

I attached the speech processed with default parameters. As you can see, the sound quality is not good.

I tried your parameters and obtained the speech that is superior to the attached one. However, there seems to be several errors in VUV detection. The parameter tuning may solve this problem.

output.zip

dreamk73 commented 7 years ago

Thanks for taking a look at this. Can you explain a little bit more which parameters you would tune? And how do you find the best parameters for a given voice / recording?

Do you have to set them in the constantnumbers.h file or is it possible to add flags to the analysis executable? It would be nice to not have to recompile the code every time we want to try it on a new voice.

Also, any idea why I would not get any parameter values at all for the one file I sent you using the analysis script in Merlin?

mmorise commented 7 years ago

Since the best parameter depends on the speech, parameter tuning is generally carried out by trial and error based on the its characteristics. In many cases, the floor and ceiling frequencies in F0 estimation are controlled. You can control these values by using the example at examples/codec_test/f0analysis.cpp. Also, one known idea is (1) the speech is analyzed by the default parameters. (2) the frequency range is set based on the result. (3) the speech is re-analyzed by the tuned parameters. For example, since the frequency range depends on the gender, the range limitation can improve the estimation performance.

I'm afraid that I have never used the Merlin, so I can't answer your last question. All I can do is to check the programs in the speech analysis from the waveform and the waveform synthesis from three parameters.

dreamk73 commented 7 years ago

Thanks. The code is Merlin is just your WORLD code. But the test/analysis.cpp script does not have flags for the floor and ceiling etc. Maybe we can rewrite it or update it with your latest code automatically.

dreamk73 commented 7 years ago

I still can't reproduce your results. I keep ending up with one or two empty files. It does seem that the F0 dips under 70Hz in a few cases. If you change the floor to below 70 Hz, is there another parameter that needs to be changed as well?

mmorise commented 7 years ago

Did you test the example "examples/codec_test/f0analysis.cpp, spanalysis.cpp and apanalysis.cpp? I confirmed that three parameters were successfully obtained even if the default parameters were used. If you can successfully obtained the parameters, the cause would be the program you used. Otherwise, please give me an information about your development environment, program you used, and the commands for execution.