kanttouchthis / text_generation_webui_xtts

XTTSv2 Extension for oobabooga text-generation-webui
148 stars 17 forks source link

WOW Great extension! The best TTS extension out there! Here are some code fixes for auto play and installation! #3

Open RandomInternetPreson opened 11 months ago

RandomInternetPreson commented 11 months ago

Firstly, thank you for taking the time to do this!!! OMG it's fast, does perfect inflections, this is eleven labs quality on my local machine AMAZING!!!!!

Here is some information to make the extension work a bit better, I'm on a windows machine so my experience might be unique to that.

  1. Auto-play keeps trying to play all audio clips in the history to fix this change this:

def history_modifier(history): if len(history["internal"]) > 0: history["visible"][-1] = [ history["visible"][-1][0], history["visible"][-1][1].replace( "controls autoplay>", "controls>") ] return history

to this:

def history_modifier(history): if len(history["internal"]) > 0: history["visible"][-1] = [ history["visible"][-1][0], history["visible"][-1][1].replace( "controls autoplay style=\"height: 30px;\">", "controls style=\"height: 30px;\">") ] return history

  1. The initial loading of the extension was not successful, this is because the folder that is created in the oob extension directory has the horizontal dashes, users need to change the folder name from:

text-generation-webui-xtts

to:

text_generation_webui_xtts

Seriously amazing stuff, thank you again for integrating this into oobabooga. I will do a pr just to have a copy to mess around with, but I'll direct people to this repo.

allenhs commented 11 months ago

Thanks!

I had to make a chance get get it to work right in linux for me.

I changed:

"controls autoplay style="height: 30px;">", "controls style="height: 30px;">")

to:

'controls autoplay style="height: 30px;">', 'controls style="height: 30px;">'

I used chatgpt to help me make the fix. It works for me, but I don't know how correct this change is.

RandomInternetPreson commented 11 months ago

What that bit of code is doing is replacing the stings inside the log file and removing the "autoplay" tag.

your code has the embeddings for the source location of the .wav files slightly different than the og barkTTS code if you look at your format_html function

def format_html(audiofiles): if params["combine"]: autoplay = "autoplay" if params["autoplay"] else "" combined = combine(audiofiles) timelabel = audiofiles[0].split("/")[-1].split("")[0] sf.write(f"{this_dir}/generated/{time_label}_combined.wav", combined, 24000) return f'<audio src="file/{this_dir}/generated/{time_label}_combined.wav" controls {autoplay} style="height: 30px;">' else: string = "" for audiofile in audiofiles: string += f'' return string

your see the string the code fix addresses:  controls style="height: 30px;">

so we are making sure we are changing this from

"controls autoplay style="height: 30px;">"

to

"controls style="height: 30px;">")

in the history of the conversation with the AI so it doesn't keep autoplaying.

RandomInternetPreson commented 11 months ago

I edited the .py file in my fork for you to reference if you need it:

https://github.com/RandomInternetPreson/text_generation_webui_xtt_Alts/blob/main/script.py

wow this works so incredibly well!

RandomInternetPreson commented 11 months ago

Sorry to keep peppering you here in this issue, but just wanted to let you know that I'd be okay if you wanted to reference my fork here: https://github.com/RandomInternetPreson/text_generation_webui_xtt_Alts for folks installing the extension for windows.

erew123 commented 11 months ago

Ill close my other issue on here, but I can confirm that on a 100% fresh install of Text-Gen-WebUI on windows, I did the following:

Run a command prompt cd text-generation-webui (wherever you have it stored on your disk) cmdwindows.bat **(cmdwindows.bat will activate your environment. Linux and Mac options are there too)** cd extensions git clone https://github.com/kanttouchthis/text_generation_webui_xtts cd text_generation_webui_xtt_Alts pip install -r requirements.txt pip install TTS --no-dependencies

cd back up to the text-generation-webui folder. Run Start_windows.bat

Agree to the license and let it download the other files it needs. (ensure its activated on the "session" tab and apply/restart)

With all that done, its running fine! :) No audio repeats etc.

One thing I do notice, it keeps the generated audio in \text-generation-webui\extensions\text_generation_webui_xtt_Alts\generated so that may need clean up from time to time.

Im sure the changes will get merged back into the original on here at some point!

Thanks for everyone's help and work on this!

erew123 commented 11 months ago

A quick note on speed vs quality etc as its not mentioned anywhere else. I notice the sample audio voice file used to generate audio, is about 7 seconds long, Mono (not stereo), PCM S16 LE with a Sample rate of 22050Hz and Bits per sample 16.

I'm guessing there are a few factors that may speed up processing.

I tried a very simple test using a 22050Hz sample voice and a 44100Hz sample voice (9 second mono sample).

22050Hz > Processing time: 59.185802936553955 44100Hz > Processing time: 125.19529104232788

This was generating the same amount of speech. Its not highly scientific, run over 1000's tests. But it would appear that if you want to use your favourite celebrity voice, get a high quality sample, make it mono, drop its bit rate to 22050Hz and keep it around the 4-9 second mark. (I suspect a shorter voice sample probably will be faster).

fbradcdsc commented 11 months ago

Followed the steps but it still gives me a

ERROR:Failed to load the extension "text_generation_webui_xtt_Alts". Traceback (most recent call last): File "C:\text-generation-webui\modules\extensions.py", line 36, in load_extensions exec(f"import extensions.{name}.script") File "", line 1, in File "C:\text-generation-webui\extensions\text_generation_webui_xtt_Alts\script.py", line 1, in from TTS.api import TTS ModuleNotFoundError: No module named 'TTS'

When restarting the webui after activating it in the session tab

RandomInternetPreson commented 11 months ago

If you are using windows follow these instructions, I've made a video to go with them. These instructions will show you how to install TTS.

https://github.com/RandomInternetPreson/text_generation_webui_xtt_Alts/tree/main#installation-windows

kanttouchthis commented 11 months ago

Sorry to keep peppering you here in this issue, but just wanted to let you know that I'd be okay if you wanted to reference my fork here: https://github.com/RandomInternetPreson/text_generation_webui_xtt_Alts for folks installing the extension for windows.

Thanks for your help!

One thing I do notice, it keeps the generated audio in \text-generation-webui\extensions\text_generation_webui_xtt_Alts\generated so that may need clean up from time to time.

I added an option to delete old files on startup in the config.json

kanttouchthis commented 11 months ago

A quick note on speed vs quality etc as its not mentioned anywhere else. I notice the sample audio voice file used to generate audio, is about 7 seconds long, Mono (not stereo), PCM S16 LE with a Sample rate of 22050Hz and Bits per sample 16.

I'm guessing there are a few factors that may speed up processing.

  • Keeping it the lower quality like the original file.
  • Fewer seconds in length (I think somewhere it says you need 4 to 12 seconds as a sample)

I tried a very simple test using a 22050Hz sample voice and a 44100Hz sample voice (9 second mono sample).

22050Hz > Processing time: 59.185802936553955 44100Hz > Processing time: 125.19529104232788

This was generating the same amount of speech. Its not highly scientific, run over 1000's tests. But it would appear that if you want to use your favourite celebrity voice, get a high quality sample, make it mono, drop its bit rate to 22050Hz and keep it around the 4-9 second mark. (I suspect a shorter voice sample probably will be faster).

The model outputs 24khz mono files, so I presume that is the ideal format for samples as well. Could potentially write code to automatically resample the input files

RandomInternetPreson commented 11 months ago

Yeass! You got the repo fixed up, thank you again for making this. It is one of the last missing pieces for AI interactions, the speed and quality is above everything else.

fbradcdsc commented 11 months ago

Alright I got it to work! The problem was I installed TTS in textgen and not in the base environment

kanttouchthis commented 11 months ago

Alright I got it to work! The problem was I installed TTS in textgen and not in the base environment

As long as you have textgen activated when running the webui that shouldn't be an issue