jurihock / stftPitchShift

STFT based real-time pitch and timbre shifting in C++ and Python
MIT License
115 stars 14 forks source link

Does the Python library still work? #35

Closed Picus303 closed 1 year ago

Picus303 commented 1 year ago

I tried to do the most simple implementation I could think of but still get a broken result. Is there something I should know that isn't in the ReadMe? Here is my code:

from stftpitchshift import StftPitchShift
from scipy.io import wavfile

samplerate, data = wavfile.read('voice.wav')
pitchshifter = StftPitchShift(1024, 256, samplerate)

new_data = pitchshifter.shiftpitch(data, 1)
wavfile.write("edited.wav", samplerate, new_data)

Based on what I understood, this code is not even supposed to modify the audio.

jurihock commented 1 year ago

The audio data returned by scipy.io.wavfile must be converted to float and normalized to [-1,+1].

Picus303 commented 1 year ago

Thanks, it worked. It is an important information and I can't find it in the ReadMe. Maybe you should add it. Here is the corrected code:

from stftpitchshift import StftPitchShift
from scipy.io import wavfile
import numpy as np

samplerate, data = wavfile.read('voice.wav')
data = data.astype(np.float16)

max_val = np.max(np.abs(data))
data = data/max_val

pitchshifter = StftPitchShift(1024, 256, samplerate)
new_data = pitchshifter.shiftpitch(data, 1.2)
new_data = (new_data*max_val).astype(np.int16)

wavfile.write("edited.wav", samplerate, new_data)
jurihock commented 1 year ago

Your snippet looks incorrect, I would prefer this one instead:

from stftpitchshift import StftPitchShift
from scipy.io import wavfile
import numpy as np

samplerate, data = wavfile.read('voice.wav')

# convert original integer data type to a normalized float data type
# (unless it's already a normalized float)
dtype = data.dtype
scale = np.iinfo(dtype).max ** -1
data = data.astype(np.float32) # use at least float32
data = data * scale

pitchshifter = StftPitchShift(1024, 256, samplerate)
data = pitchshifter.shiftpitch(data, 1)

# convert result to the desired integer data type
# (or keep it as is)
dtype = np.int16
scale = np.iinfo(dtype).max
data = data.clip(-1, +1) # preventively avoid clipping
data = (data * scale).astype(dtype)

wavfile.write('edited.wav', samplerate, data)