thuhcsi / NeuCoSVC

255 stars 38 forks source link

convert multi-channel input into single-channel #3

Closed darai0512 closed 10 months ago

darai0512 commented 10 months ago

Hi, I and my Japanese voice AI enthusiast community like this product.

This is trivial PR. If you don't need, please close this.

Abstract

If input audio is stereo, I get the following error.

(venv) D:\NeuCoSVC>python infer.py --src_wav_path input.wav --ref_wav_path ref.wav --out_path out --speech_enroll
using cuda for inference.
Loading svc model configurations.
wavlm loaded.
loading models cost 6.86s.
Processing feats.
The wav file input.wav has 2 channels, select the first one to proceed.
D:\NeuCoSVC\venv\lib\site-packages\librosa\core\convert.py:1332: RuntimeWarning: divide by zero encountered in log10
  + 2 * np.log10(f_sq)
The wav file input.wav has 2 channels, select the first one to proceed.
pitch shift factor: 1.10
Original audio sr is 24000, change it to 16000.
Traceback (most recent call last):
  File "D:\NeuCoSVC\infer.py", line 153, in <module>
    VoiceConverter(test_utt=args.src_wav_path, ref_utt=args.ref_wav_path, out_path=args.out_path,
  File "D:\NeuCoSVC\infer.py", line 44, in VoiceConverter
    query_feats = wavlm_encoder.get_features(test_utt, weights=applied_weights)
  File "D:\NeuCoSVC\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\NeuCoSVC\modules\wavlm_encoder.py", line 97, in get_features
    features = (features*weights[:, None] ).sum(dim=0) # (1, seq_len, dim)
RuntimeError: The size of tensor a (50) must match the size of tensor b (25) at non-singleton dimension 0

The log The wav file input.wav has 2 channels, select the first one to proceed. and doc string test_utt (str): Path to the source singing waveform (24kHz, single-channel). tell us that input audio should be single-channel, but final error message is difficult for me.

The other processes select the first one to proceed in case of multi-channel. So, infer selects the same. (Or, raising error message of input audio should be single-channel is better?)

jerry1331 commented 10 months ago

Thank you for your support and the feedback. We have developed our project specifically for mono audio and have not tested it with stereo audio. Considering that using stereo audio may cause additional errors, we have addressed this issue in a new commit. Inputting stereo audio will now result in an AssertionError.