sh-lee-prml / HierSpeechpp

The official implementation of HierSpeech++
MIT License
1.15k stars 134 forks source link

[bug] when vc my own audios, the error occurs #51

Open sekkit opened 1 month ago

sekkit commented 1 month ago

X:\tmp\HierSpeechpp\inference_vc.py:78: UserWarning: "kaiser_window" resampling method name is being deprecated and replaced by "sinc_interp_kaiser" in the next release. The default behavior remains unchanged. source_audio = torchaudio.functional.resample(source_audio, sample_rate, 16000, resampling_method="kaiser_window") C:\Users\sekkitshi\AppData\Local\miniconda3\envs\hierspeech2\lib\site-packages\amfm_decompy\pYAAPT.py:970: RuntimeWarning: invalid value encountered in divide phi[lag_min:lag_max] = formula_nume/np.sqrt(formula_denom) X:\tmp\HierSpeechpp\inference_vc.py:103: UserWarning: "kaiser_window" resampling method name is being deprecated and replaced by "sinc_interp_kaiser" in the next release. The default behavior remains unchanged. target_audio = torchaudio.functional.resample(target_audio, sample_rate, 16000, resampling_method="kaiser_window") Traceback (most recent call last): File "X:\tmp\HierSpeechpp\inference_vc.py", line 254, in main() File "X:\tmp\HierSpeechpp\inference_vc.py", line 251, in main inference(a) File "X:\tmp\HierSpeechpp\inference_vc.py", line 220, in inference VC(a, hierspeech) File "X:\tmp\HierSpeechpp\inference_vc.py", line 168, in VC write(output_file, 48000, converted_audio) File "C:\Users\sekkitshi\AppData\Local\miniconda3\envs\hierspeech2\lib\site-packages\scipy\io\wavfile.py", line 797, in write fmt_chunk_data = struct.pack('<HHIIHH', format_tag, channels, fs, struct.error: ushort format requires 0 <= number <= 0xffff

sekkit commented 1 month ago

from soundfile import SoundFile with SoundFile('stereo_file.wav', 'w', 48000, 2) as f: f.write(converted_audio.T) switching wavfile to pysoundfile, change converted_audio to converted_audio.T, it works.

sh-lee-prml commented 1 month ago

If you use a stereo wave file, please use only one channel (Most use the first channel of audio).

Our model only supports a single channel audio.

sekkit commented 1 month ago

source_audio, sample_rate = torchaudio.load(a.source_speech) source_audio = torch.mean(source_audio, dim=0, keepdim=True) #moded by me by adding the line that converts stereo audio to mono, the error is gone

sh-lee-prml commented 1 month ago

I recommend using audio[0:,:] because averaging two channels might generate the reverb sound in some cases