Nothing alike - Githubissues

chrisbward commented 9 months ago

Not sure what's happening here - I managed to spin this up in the local gradio app, recorded my own voice, but inference gave me an american-sounding output - I'm British - is that expected?

Thanks!

chrisbward commented 9 months ago

I tried mimicking different voices on my microphone, but the output is always the same - is the app broken?

chrisbward commented 9 months ago

Okay, so a little further investigation;

I have copied demo_speaker1.mp3 to someother.mp3 and dragged this in to Gradio - and the voice is not cloned.

If I drag demo_speaker1.mp3 in, it works fine - so I do not think there is any inference happening at all. It's determined by filename.

pixelass commented 9 months ago

I tried this in collab and did several recordings. Result: It is not good... at all, I mean it is VERY bad (in terms of likeness). Sorry, but it seems to be very biased to certain voices that it can copy.

Just another overhyped no-good model :(

chrisbward commented 9 months ago

I think there are bugs in the gradio app and the cloning is not attempted at all.

pixelass commented 9 months ago

oh, no, it is definitely using the audio (at least in collab) (if the "use microphone" checkbox is checked) but the result is nothing like the original voice (it is noticable that it is trying to copy a voice though).

chrisbward commented 9 months ago

You'll notice if you switch the files around in the samples folder, it uses the original samples voice, even when removed

basher0 commented 9 months ago

Why the voice in the paper samples sounds perfect, but when I run it locally, it doesn't sound anything like it. StyleTTS 2 is much better.

iwoolf commented 9 months ago

I installed it locally and tried my own voice and got a standard voice instead. I tried the supplied demos, and they all came out exactly the same standard voice I can't figure how to change the accent, there are no example prompts.

Zengyi-Qin commented 9 months ago

Hi All - Regarding the accent, please read the paper carefully before judging the results. The accent should be controlled by the base speaker model. The tone color converter does not clone your accent. This demo only provides control over emotion, and the accent is default to American accent. The users can use their own base speaker model (British accent) to replace the base speaker model in OpenVoice. The OpenVoice framework provides sufficient flexibility to do it and allows users to use whatever base speaker model they have.

Tpann2518 commented 9 months ago

ok

davechilds commented 9 months ago

I have also tried the app as a local install and the voices come back sounding nothing like the reference audio. They all seem to sound like a teenage boy, with slight differences when different references are used. I tried resampling the original audio, in case they needed to be a specific KBit or Hz value, but this did not make any difference. Is there any reason why the results are so different from what I expected from the examples given on the website?

hehuan2363 commented 9 months ago

I did the similar experience as everyone mentioned above in the colab, the cloning likeness is not very good (far worse than the demo example, video is here: https://youtu.be/Fx4iiy4eVoM?t=558). I am also wondering if there is anything wrong. I tested OpenAITTS plus RVC before (similar idea behind) but with better result.

For chinese cloning, so far I found BERT-VITS2 is still the best open solution.

Zengyi-Qin commented 9 months ago

@hehuan2363 could you try the lepton demo and see if anything changed https://www.lepton.ai/playground/openvoice

YKefasu commented 9 months ago

@hehuan2363 could you try the lepton demo and see if anything changed https://www.lepton.ai/playground/openvoice

It have same result

pixelass commented 9 months ago

@hehuan2363 could you try the lepton demo and see if anything changed https://www.lepton.ai/playground/openvoice

I also tested this. Same inaccuracy on several voices I've tried. None of them come close to the examples.

Aegon95 commented 9 months ago

@hehuan2363 could you try the lepton demo and see if anything changed https://www.lepton.ai/playground/openvoice

Same result, The output is no where close and sounds childish

DKRacingFan commented 9 months ago

I'm sorry but this model is very inaccurate and bad. Hopefully you update it to significantly improve it as all the voices I tried are bad. Joe Biden, MattVidProAi, MrBeast etc...

RASPIAUDIO commented 9 months ago

I agree the result is very disapointing it almost feels like it is a scam. There are other older opensource models that does it better: https://github.com/coqui-ai/TTS

Zengyi-Qin commented 9 months ago

You can find the answers here https://github.com/myshell-ai/OpenVoice/blob/main/QA.md

RASPIAUDIO commented 9 months ago

You can find the answers here https://github.com/myshell-ai/OpenVoice/blob/main/QA.md

Evrything seems OK, but I have this warning; not sure if it is a problem ?

(openvoice) PS C:\holobot\prod\OpenVoice> & C:/Users/olivi/anaconda3/envs/openvoice/python.exe c:/holobot/prod/OpenVoice/testopenvoice2.py Loaded checkpoint 'checkpoints/converter/checkpoint.pth' missing/unexpected keys: [] [] [(0.0, 13.33), (13.358, 23.666), (23.886, 38.002), (38.67, 51.144)] after vad: dur = 50.228 C:\Users\olivi\anaconda3\envs\openvoice\lib\site-packages\wavmark\models\my_model.py:25: UserWarning: istft will require a complex-valued input tensor in a future PyTorch release. Matching the output from stft with return_complex=True. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\SpectralOps.cpp:980.) return torch.istft(signal_wmd_fft, n_fft=self.n_fft, hop_length=self.hop_length, window=window,

jackyin68 commented 8 months ago

Oh, No。But how to solve this. May give a example in more detail!!

myshell-ai / OpenVoice

Nothing alike #43