Improve phase vocoder - Githubissues

jurihock / stftPitchShift

STFT based real-time pitch and timbre shifting in C++ and Python

MIT License

120 stars 15 forks source link

Improve phase vocoder #23

Open jurihock opened 2 years ago

jurihock commented 2 years ago

Just as an alternative idea to the phase vocoder...

DFT magnitude based phase estimation:

However:

SPSI doesn't work well with SDFT and STFT at smaller hops like 2048/32 (as tested in voyx).
PGHI appears to be 6-8 times slower than SPSI.

jurihock commented 2 years ago

The arctan approximation [3] is still faster than std::arg (about 50ms difference in case of the default voice sample), but of course less accurate e.g. compared to the python implementation. ~~Since accuracy is more important to me at this point, the arctan approximation will be not yet implemented.~~ Done in #40.

Regarding [1], my current observation is that the sliding vocoder generally produces less artifacts especially if pitching instrumental recordings. So it makes more sense to discover the sliding DFT first instead of obfuscating the vocoder...

jurihock commented 1 year ago

Idea:

Use $log(a \cdot e^{j \phi}) = log(a) + j \phi$ instead of explicit std::abs and std::arg calls, since both log-amplitude and phase are needed anyway.

jurihock commented 10 months ago

The pitch shifting result is comparable to the signalsmith:

make
./out/main hybrid-phase --trim --freq=2 input.wav output.wav

where input.wav is the original dno-solo example converted to 16-bit mono wav.

Actually the default signalsmith configuration uses 6144 (3072 without zero padding) DFT bins, which is not power of two. Disabling multipleTimeObservations and zeroPadding makes no noticeable difference.

The comparable stftPitchShift configuration is -w 8k -v 4.