Open jurihock opened 2 years ago
Just as an alternative idea to the phase vocoder...
DFT magnitude based phase estimation:
However:
The arctan approximation [3] is still faster than std::arg
(about 50ms difference in case of the default voice sample), but of course less accurate e.g. compared to the python implementation. Since accuracy is more important to me at this point, the arctan approximation will be not yet implemented. Done in #40.
Regarding [1], my current observation is that the sliding vocoder generally produces less artifacts especially if pitching instrumental recordings. So it makes more sense to discover the sliding DFT first instead of obfuscating the vocoder...
Idea:
Use $log(a \cdot e^{j \phi}) = log(a) + j \phi$ instead of explicit std::abs
and std::arg
calls, since both log-amplitude and phase are needed anyway.
The pitch shifting result is comparable to the signalsmith:
make
./out/main hybrid-phase --trim --freq=2 input.wav output.wav
where input.wav
is the original dno-solo
example converted to 16-bit mono wav.
Actually the default signalsmith configuration uses 6144 (3072 without zero padding) DFT bins, which is not power of two. Disabling multipleTimeObservations
and zeroPadding
makes no noticeable difference.
The comparable stftPitchShift configuration is -w 8k -v 4
.