jurihock / stftPitchShift

STFT based real-time pitch and timbre shifting in C++ and Python
MIT License
115 stars 14 forks source link

Quefrency - What are the suggested range? #52

Open pacomacman opened 2 months ago

pacomacman commented 2 months ago

I'm struggling with the Quefrency setting. Although in the examples you set it up as 0.001 = 1ms, you don't mention anywhere the suggested range and if this is a linear or exponention setting?

jurihock commented 2 months ago

The specified quefrency value is used as a threshold, to separate the fundamental frequency from harmonics in the cepstral domain.

To enable the formant preservation feature specify a suitable quefrency value in milliseconds. Depending on the source signal, begin with a small value like -q 1. Generally, the quefrency value has to be smaller than the fundamental period, as reciprocal of the fundamental frequency, of the source signal.

The example value of 1ms seems to work quite well for the example file voice.wav. The most convenient way is probably to read a suitable quefrency value from the cepstrogram, like in this Python sketch.

I'll think about adding a cepstrogram plot to the Python version of the stftPitchShift in the next release. So we can reserve this issue for that.

pacomacman commented 2 months ago

Thanks for getting back to me so quickly. While I understand that a value of 1ms is a great place to start, I assume values of less that 1ms are perfectly acceptable and what would you say is good for an upper limit (say 3ms)?

Obviously the value of 1ms works for your voice.wav, but I guess people will want to give users the choice of setting this themselves alongside the quefency setting, so some range is required. So I'm really just wanting to understand what is an acceptable range in your opinion.?

jurihock commented 2 months ago

Since formant preservation is primarily intended for the human voice, I think the physiological vocal range could be a good starting point to roughly limit the quefrency threshold range.