Open RandomInternetPreson opened 11 months ago
Thanks!
I had to make a chance get get it to work right in linux for me.
I changed:
"controls autoplay style="height: 30px;">", "controls style="height: 30px;">")
to:
'controls autoplay style="height: 30px;">', 'controls style="height: 30px;">'
I used chatgpt to help me make the fix. It works for me, but I don't know how correct this change is.
What that bit of code is doing is replacing the stings inside the log file and removing the "autoplay" tag.
your code has the embeddings for the source location of the .wav files slightly different than the og barkTTS code if you look at your format_html function
def format_html(audiofiles): if params["combine"]: autoplay = "autoplay" if params["autoplay"] else "" combined = combine(audiofiles) timelabel = audiofiles[0].split("/")[-1].split("")[0] sf.write(f"{this_dir}/generated/{time_label}_combined.wav", combined, 24000) return f'<audio src="file/{this_dir}/generated/{time_label}_combined.wav" controls {autoplay} style="height: 30px;">' else: string = "" for audiofile in audiofiles: string += f'' return string
your see the string the code fix addresses: controls style="height: 30px;">
so we are making sure we are changing this from
"controls autoplay style="height: 30px;">"
to
"controls style="height: 30px;">")
in the history of the conversation with the AI so it doesn't keep autoplaying.
I edited the .py file in my fork for you to reference if you need it:
https://github.com/RandomInternetPreson/text_generation_webui_xtt_Alts/blob/main/script.py
wow this works so incredibly well!
Sorry to keep peppering you here in this issue, but just wanted to let you know that I'd be okay if you wanted to reference my fork here: https://github.com/RandomInternetPreson/text_generation_webui_xtt_Alts for folks installing the extension for windows.
Ill close my other issue on here, but I can confirm that on a 100% fresh install of Text-Gen-WebUI on windows, I did the following:
Run a command prompt cd text-generation-webui (wherever you have it stored on your disk) cmdwindows.bat **(cmdwindows.bat will activate your environment. Linux and Mac options are there too)** cd extensions git clone https://github.com/kanttouchthis/text_generation_webui_xtts cd text_generation_webui_xtt_Alts pip install -r requirements.txt pip install TTS --no-dependencies
cd back up to the text-generation-webui folder. Run Start_windows.bat
Agree to the license and let it download the other files it needs. (ensure its activated on the "session" tab and apply/restart)
With all that done, its running fine! :) No audio repeats etc.
One thing I do notice, it keeps the generated audio in \text-generation-webui\extensions\text_generation_webui_xtt_Alts\generated so that may need clean up from time to time.
Im sure the changes will get merged back into the original on here at some point!
Thanks for everyone's help and work on this!
A quick note on speed vs quality etc as its not mentioned anywhere else. I notice the sample audio voice file used to generate audio, is about 7 seconds long, Mono (not stereo), PCM S16 LE with a Sample rate of 22050Hz and Bits per sample 16.
I'm guessing there are a few factors that may speed up processing.
I tried a very simple test using a 22050Hz sample voice and a 44100Hz sample voice (9 second mono sample).
22050Hz > Processing time: 59.185802936553955 44100Hz > Processing time: 125.19529104232788
This was generating the same amount of speech. Its not highly scientific, run over 1000's tests. But it would appear that if you want to use your favourite celebrity voice, get a high quality sample, make it mono, drop its bit rate to 22050Hz and keep it around the 4-9 second mark. (I suspect a shorter voice sample probably will be faster).
Followed the steps but it still gives me a
ERROR:Failed to load the extension "text_generation_webui_xtt_Alts".
Traceback (most recent call last):
File "C:\text-generation-webui\modules\extensions.py", line 36, in load_extensions
exec(f"import extensions.{name}.script")
File "
When restarting the webui after activating it in the session tab
If you are using windows follow these instructions, I've made a video to go with them. These instructions will show you how to install TTS.
Sorry to keep peppering you here in this issue, but just wanted to let you know that I'd be okay if you wanted to reference my fork here: https://github.com/RandomInternetPreson/text_generation_webui_xtt_Alts for folks installing the extension for windows.
Thanks for your help!
One thing I do notice, it keeps the generated audio in \text-generation-webui\extensions\text_generation_webui_xtt_Alts\generated so that may need clean up from time to time.
I added an option to delete old files on startup in the config.json
A quick note on speed vs quality etc as its not mentioned anywhere else. I notice the sample audio voice file used to generate audio, is about 7 seconds long, Mono (not stereo), PCM S16 LE with a Sample rate of 22050Hz and Bits per sample 16.
I'm guessing there are a few factors that may speed up processing.
- Keeping it the lower quality like the original file.
- Fewer seconds in length (I think somewhere it says you need 4 to 12 seconds as a sample)
I tried a very simple test using a 22050Hz sample voice and a 44100Hz sample voice (9 second mono sample).
22050Hz > Processing time: 59.185802936553955 44100Hz > Processing time: 125.19529104232788
This was generating the same amount of speech. Its not highly scientific, run over 1000's tests. But it would appear that if you want to use your favourite celebrity voice, get a high quality sample, make it mono, drop its bit rate to 22050Hz and keep it around the 4-9 second mark. (I suspect a shorter voice sample probably will be faster).
The model outputs 24khz mono files, so I presume that is the ideal format for samples as well. Could potentially write code to automatically resample the input files
Yeass! You got the repo fixed up, thank you again for making this. It is one of the last missing pieces for AI interactions, the speed and quality is above everything else.
Alright I got it to work! The problem was I installed TTS in textgen and not in the base environment
Alright I got it to work! The problem was I installed TTS in textgen and not in the base environment
As long as you have textgen activated when running the webui that shouldn't be an issue
Firstly, thank you for taking the time to do this!!! OMG it's fast, does perfect inflections, this is eleven labs quality on my local machine AMAZING!!!!!
Here is some information to make the extension work a bit better, I'm on a windows machine so my experience might be unique to that.
def history_modifier(history): if len(history["internal"]) > 0: history["visible"][-1] = [ history["visible"][-1][0], history["visible"][-1][1].replace( "controls autoplay>", "controls>") ] return history
to this:
def history_modifier(history): if len(history["internal"]) > 0: history["visible"][-1] = [ history["visible"][-1][0], history["visible"][-1][1].replace( "controls autoplay style=\"height: 30px;\">", "controls style=\"height: 30px;\">") ] return history
text-generation-webui-xtts
to:
text_generation_webui_xtts
Seriously amazing stuff, thank you again for integrating this into oobabooga. I will do a pr just to have a copy to mess around with, but I'll direct people to this repo.