mmorise / World

A high-quality speech analysis, manipulation and synthesis system
http://www.kisc.meiji.ac.jp/~mmorise/world/english
Other
1.19k stars 255 forks source link

Pitch prediction output length #128

Closed tebin closed 3 years ago

tebin commented 3 years ago

There seems to be an extra time step at the end, and I don't understand where it comes from.

Consider this example in Python wrapper.

>>> wav = np.ones(16000).astype(np.float64)
>>> sr = 16000
>>> wav[sr]
IndexError: index 16000 is out of bounds for axis 0 with size 16000

Which is expected, since the last index is 15999.

Now when I try to run dio or harvest on this audio with hop size 10ms=160 samples:

>>> f0, timesteps = pyworld.harvest(wav, sr, frame_period=10.0)
>>> len(f0)
101
>>> timesteps
array([0., 0.01, ..., 0.99, 1.0])

According #99 f0 values should be evaluated at 0, 160, 320, ..., 15840, 16000, but since index 16000 is out of bounds it should stop at 15840. So I expect the output to have 100 f0 values and timesteps should not have that last value of 1.0.

Can you please explain why the length is 101 not 100? Or is this not an intended behavior, thus an issue with the Python wrapper?

mmorise commented 3 years ago

This is not a bug and a designed result. This system has functions for not only analysis but also synthesis. The analyzed parameter should have a little longer than the input to maintain the same length between the input and output signals. The function is tuned not to access the out of bounds.