wsippel / bark_tts

Oobabooga extension for Bark TTS
GNU Affero General Public License v3.0
110 stars 14 forks source link

update bark? #17

Open St33lMouse opened 1 year ago

St33lMouse commented 1 year ago

Is this extension using Bark v2? I tried this repo here:

https://github.com/rsxdalv/tts-generation-webui

I used that repo to clone a voice and dropped it into the voices folder for your extension to hear the output in ooba. It works! But...I think the voice quality is better in that repo I just linked to. I'm wondering if maybe bark needs to be updated for this extension?

Anyway, great work!

Ph0rk0z commented 1 year ago

His bark is out of date and has no custom commits. You can just re install from the main repo.

What he probably did was implement other things for voice cloning just like: https://github.com/gitmylo/audio-webui

If you think it's good with pure bark, bark with RVC on top is even better.

St33lMouse commented 1 year ago

I'll check out that repo you linked to. What's RVC?

Ph0rk0z commented 1 year ago

RVC is a model that trains on pitches from your target and then "corrects" the audio you feed it to match them.

So your not so good TTS output sounds much more like the person. Works on songs too.

St33lMouse commented 1 year ago

Are there any repos that you'd recommend? Does it mix with Bark in any way?

Ph0rk0z commented 1 year ago

That webui lets you clone with bark and run RVC over it for really good results. Unfortunately nobody has taken this process and used it for AI chat TTS.

St33lMouse commented 1 year ago

I installed it, but it has a weird interface with truly minimal explanations on what stuff did and gave me some errors so I decided to let it bake for awhile before I came back to it. So:

you clone with Bark and then do a tts output and RVC makes a better output,

OR

you use RVC to improve the Bark clone file?

An improved bark clone file could then be dropped into the ooba Bark extension and used as a better voice. THAT would really be worth looking into.

Ph0rk0z commented 1 year ago

You clone with bark or use some other similar TTS output and then you use the RVC model over it.

I already have NPZ to try to use in bark here but I don't think the cloning is set to work consistently in default bark. Make a file with that repo and try it.