mmorise / World

A high-quality speech analysis, manipulation and synthesis system
http://www.kisc.meiji.ac.jp/~mmorise/world/english
Other
1.17k stars 251 forks source link

NaN sometimes introduced in coarse aperiodicity estimation #92

Closed 3628800 closed 4 years ago

3628800 commented 4 years ago

This can occur when the input signal contains a strong single frequency in a band(s) of interest (for example, a pure 440Hz sine wave in part of the recording).

Example file: 440.zip

mmorise commented 4 years ago

Thank you for your request. Please give me a few days because I'm too busy to check it.

I think that your idea is reasonable, on the other hand, I'd like to check whether we can avoid the NaN value in the power_spectrum by a more simple approach.

mmorise commented 4 years ago

I attempted to replicate your error by using the attached audio file, but no NaN was observed. Since the audio file is stereo, I used the 1 ch of the file that includes the 440-Hz pure tone.

If possible, please give me more information such as your OS. I checked the program by using a Windows laptop PC.

3628800 commented 4 years ago

Thanks for your investigation. I was testing on both Windows and Mac. I cannot reproduce the NaN either with the file I attached-- I discovered that exporting the file sneakily re-encoded the data values to signed integer 16... I observe NaN in the new attached file (containing the mean of the two channels, properly encoded). Using a window size of 2048 and offset 512 samples, it appears in the 10th array of the result (enumerating from 1). 440-mean-double.wav.zip

mmorise commented 4 years ago

Sorry, I could not reproduce the error. Could you give me a test code? I set the parameters for analysis according to your comments, but they may be different from the parameter you used.

JeremyCCHsu commented 4 years ago

Hi @mmorise , not sure if this is related, but a user reported a similar issue (nan in AP) on the Python side. https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder/issues/50

It seems to be a problem in Python's type casting mechanism (float to double), but I did not really know what the root cause was. I reposted it here just for your information. Please let me know if anything crosses your mind. Thanks.

mmorise commented 4 years ago

I tested the audio file (bed (537).wav) but could not obtain the NaN value in the aperiodicity. It would be possible to solve this problem if I confirmed the error by using an audio file and a test code.