Open aicoder2048 opened 6 months ago
Dose source_se need to be from audio of the same person's voice as source audio to inference to get close or better clone quality?
I got the following warnings, could any of those warnings make the clone similarity to drastically degrade ?
Dose source_se need to be from audio of the same person's voice as source audio to inference to get close or better clone quality?
I tried to use same (base-speaker) person's voice/mp3 for getting "source_se/tone color embedding" and "source audio to inference" , and a third male voice/mp3 as reference speaker. The resulting cloned audio, which sometime is female with a bit noise, is still far from the reference male audio. Very Bizarred !
so, to my conclusion from the experiment, the source_se and source audio to inference don't have to be from same person, or at least, it doesn't matter towards affecting/improving clone similarity.
just a couplel of sents to share ... have fun
Sean
It is true that for V1 the reference audio for cloning voice and the generated outputs are not similar. I don't think this is cloning the voice very well
Hi,
I am trying out Open Voice (v1), and it mechanically worked, but the cloned voice is far from its reference speaker. Sometimes, I gave a male reference speaker mp3, and got back a female voice.
I run the code from "demo_part1.ipynb" and I only changed reference speaker's mp3.
I suspect the torch/embedding version is not compatible, and I am using: (Speech2Rag) OpenVoice> pip show torch Name: torch Version: 2.1.2+cu121 Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration Home-page: https://pytorch.org/ Author: PyTorch Team Author-email: packages@pytorch.org License: BSD-3 Location: C:\Users\Sean2092\miniconda3\Lib\site-packages Requires: filelock, fsspec, jinja2, networkx, sympy, typing-extensions Required-by: pytorch-lightning, torchaudio, torchmetrics, torchvision
Could someone with success and experience help out? I am sure I got something, libs or settings, incorrect, but I cannot figure out what that might be. Pls help.
Thanks a lot, Sean