Open sekkit opened 1 month ago
from soundfile import SoundFile with SoundFile('stereo_file.wav', 'w', 48000, 2) as f: f.write(converted_audio.T) switching wavfile to pysoundfile, change converted_audio to converted_audio.T, it works.
If you use a stereo wave file, please use only one channel (Most use the first channel of audio).
Our model only supports a single channel audio.
source_audio, sample_rate = torchaudio.load(a.source_speech) source_audio = torch.mean(source_audio, dim=0, keepdim=True) #moded by me by adding the line that converts stereo audio to mono, the error is gone
I recommend using audio[0:,:] because averaging two channels might generate the reverb sound in some cases
X:\tmp\HierSpeechpp\inference_vc.py:78: UserWarning: "kaiser_window" resampling method name is being deprecated and replaced by "sinc_interp_kaiser" in the next release. The default behavior remains unchanged. source_audio = torchaudio.functional.resample(source_audio, sample_rate, 16000, resampling_method="kaiser_window") C:\Users\sekkitshi\AppData\Local\miniconda3\envs\hierspeech2\lib\site-packages\amfm_decompy\pYAAPT.py:970: RuntimeWarning: invalid value encountered in divide phi[lag_min:lag_max] = formula_nume/np.sqrt(formula_denom) X:\tmp\HierSpeechpp\inference_vc.py:103: UserWarning: "kaiser_window" resampling method name is being deprecated and replaced by "sinc_interp_kaiser" in the next release. The default behavior remains unchanged. target_audio = torchaudio.functional.resample(target_audio, sample_rate, 16000, resampling_method="kaiser_window") Traceback (most recent call last): File "X:\tmp\HierSpeechpp\inference_vc.py", line 254, in
main()
File "X:\tmp\HierSpeechpp\inference_vc.py", line 251, in main
inference(a)
File "X:\tmp\HierSpeechpp\inference_vc.py", line 220, in inference
VC(a, hierspeech)
File "X:\tmp\HierSpeechpp\inference_vc.py", line 168, in VC
write(output_file, 48000, converted_audio)
File "C:\Users\sekkitshi\AppData\Local\miniconda3\envs\hierspeech2\lib\site-packages\scipy\io\wavfile.py", line 797, in write
fmt_chunk_data = struct.pack('<HHIIHH', format_tag, channels, fs,
struct.error: ushort format requires 0 <= number <= 0xffff