neonbjb / tortoise-tts

A multi-voice TTS system trained with an emphasis on quality
Apache License 2.0
12.89k stars 1.78k forks source link

Attempted CUDA usage despite indicating its disablement. #581

Open Chigoma333 opened 1 year ago

Chigoma333 commented 1 year ago

The code initially indicates that CUDA will be disabled but continues to attempt its use later, resulting in failures. The intention is to run the code exclusively on the CPU.

Code:

def init_tortoise(use_deepspeed, kv_cache, half, num_autoregressive_samples):
    tts = TextToSpeech(use_deepspeed=use_deepspeed, kv_cache=kv_cache, half=half)
    return tts

def generate_tortoise(text_input, tts, diffusion_iterations, num_autoregressive_samples, temperature, CUSTOM_VOICE_NAME):
    extra_voice_dirs = ["voices"]
    voice_samples, conditioning_latents = load_voice(CUSTOM_VOICE_NAME, extra_voice_dirs=extra_voice_dirs)

    gen = tts.tts_with_preset(text_input,
                voice_samples=voice_samples,
                conditioning_latents=conditioning_latents, 
                preset="fast",
                diffusion_iterations=diffusion_iterations,
                num_autoregressive_samples=num_autoregressive_samples, 
                cond_free=True, 
                temperature=temperature)

    # Create a temporary WAV file to save the generated audio
    with tempfile.NamedTemporaryFile(delete=False, suffix=".wav") as temp_file:
        torchaudio.save(temp_file, gen.squeeze(0).cpu(), 24000, format="wav")
        temp_file_path = temp_file.name  # Store the temporary file path

    return temp_file_path

/home/chigoma333/.local/lib/python3.11/site-packages/torch/amp/autocast_mode.py:204: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling') 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:09<00:00, 4.59s/it] Computing best candidates using CLVP 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2.48it/s] Transforming autoregressive outputs into audio.. 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:19<00:00, 2.55it/s] Ignoring exception in command generate_tortoise_test: Traceback (most recent call last): File "/usr/lib/python3.11/site-packages/discord/commands/core.py", line 124, in wrapped ret = await coro(arg) ^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/site-packages/discord/commands/core.py", line 982, in _invoke await self.callback(ctx, **kwargs) File "/home/chigoma333/Desktop/Program/Discord_bot/main.py", line 229, in generate_tortoise_test file = await generate_tts_thread(text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/chigoma333/Desktop/Program/Discord_bot/main.py", line 296, in generate_tts_thread result = await bot.loop.run_in_executor(executor, generate_tortoise, text, tortoise_tts, tortoise_diffusion_iterations, tortoise_num_autoregressive_samples, tortoise_temperature, tortoise_voice) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/chigoma333/Desktop/Program/Discord_bot/text2speech.py", line 36, in generate_tortoise gen = tts.tts_with_preset(text_input, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/chigoma333/.local/lib/python3.11/site-packages/tortoise/api.py", line 346, in tts_with_preset return self.tts(text, **settings) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/chigoma333/.local/lib/python3.11/site-packages/tortoise/api.py", line 601, in tts wav_candidates = [potentially_redact(wav_candidate, text) for wav_candidate in wav_candidates] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/chigoma333/.local/lib/python3.11/site-packages/tortoise/api.py", line 601, in <listcomp> wav_candidates = [potentially_redact(wav_candidate, text) for wav_candidate in wav_candidates] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/chigoma333/.local/lib/python3.11/site-packages/tortoise/api.py", line 599, in potentially_redact return self.aligner.redact(clip.squeeze(1), text).unsqueeze(1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/chigoma333/.local/lib/python3.11/site-packages/tortoise/utils/wav2vec_alignment.py", line 144, in redact alignments = self.align(audio, bare_text, audio_sample_rate) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/chigoma333/.local/lib/python3.11/site-packages/tortoise/utils/wav2vec_alignment.py", line 62, in align self.model = self.model.to(self.device) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/chigoma333/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 1900, in to return super().to(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/chigoma333/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to return self._apply(convert) ^^^^^^^^^^^^^^^^^^^^ File "/home/chigoma333/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/home/chigoma333/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/home/chigoma333/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) [Previous line repeated 2 more times] File "/home/chigoma333/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 820, in _apply param_applied = fn(param) ^^^^^^^^^ File "/home/chigoma333/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/chigoma333/.local/lib/python3.11/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init torch._C._cuda_init() RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Chigoma333 commented 1 year ago

I did some further testing. Here is my minimal code to reproduce. (I also tested without setting diffusion_iterations and num_autoregressive_samples; it still produces the error. It simply takes too long on the CPU if not set low.)

from tortoise.api import TextToSpeech
import torchaudio

def init_tortoise():
    tts = TextToSpeech()
    return tts

def generate_tortoise(text_input, tts):

    print(text_input)

    gen = tts.tts_with_preset(text_input,
                    diffusion_iterations=diffusion_iterations,
                    num_autoregressive_samples=num_autoregressive_samples
                    )

    torchaudio.save(f'tortoise.wav', gen.squeeze(0).cpu(), 24000)
    return f'tortoise.wav'

num_autoregressive_samples = 2
diffusion_iterations=50

tts = init_tortoise()
generate_tortoise("[sad] Hello this is a tortoise generation test.", tts)

the Error:

Generating autoregressive samples..
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:19<00:00, 10.00s/it]
Computing best candidates using CLVP
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  3.01it/s]
Transforming autoregressive outputs into audio..
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:16<00:00,  3.05it/s]
Traceback (most recent call last):
File "/home/chigoma333/Desktop/Program/Totoise_test/main.py", line 24, in <module>
generate_tortoise("[I am sad]Hello this is a tortoise generation test.", tts)
File "/home/chigoma333/Desktop/Program/Totoise_test/main.py", line 12, in generate_tortoise
gen = tts.tts_with_preset(text_input,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chigoma333/.local/lib/python3.11/site-packages/tortoise/api.py", line 346, in tts_with_preset
return self.tts(text, **settings)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chigoma333/.local/lib/python3.11/site-packages/tortoise/api.py", line 601, in tts
wav_candidates = [potentially_redact(wav_candidate, text) for wav_candidate in wav_candidates]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chigoma333/.local/lib/python3.11/site-packages/tortoise/api.py", line 601, in <listcomp>
wav_candidates = [potentially_redact(wav_candidate, text) for wav_candidate in wav_candidates]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chigoma333/.local/lib/python3.11/site-packages/tortoise/api.py", line 599, in potentially_redact
return self.aligner.redact(clip.squeeze(1), text).unsqueeze(1)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chigoma333/.local/lib/python3.11/site-packages/tortoise/utils/wav2vec_alignment.py", line 144, in redact
alignments = self.align(audio, bare_text, audio_sample_rate)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chigoma333/.local/lib/python3.11/site-packages/tortoise/utils/wav2vec_alignment.py", line 62, in align
self.model = self.model.to(self.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chigoma333/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 1900, in to
return super().to(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chigoma333/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
^^^^^^^^^^^^^^^^^^^^
File "/home/chigoma333/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/chigoma333/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/chigoma333/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "/home/chigoma333/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 820, in _apply
param_applied = fn(param)
^^^^^^^^^
File "/home/chigoma333/.local/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/chigoma333/.local/lib/python3.11/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

What I found out is that the text input "Hello, this is a tortoise generation test." doesn't give me this error, but '[sad] Hello, this is a tortoise generation test.' gives the error. I also tested "[sad] Hello, this is a tortoise generation test." (without a space between ']' and 'Hello'). I have no idea why it is trying to use CUDA when something is given in square brackets.

The warning message "UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')" doesn't seem to have anything to do with the issue that appears when someone tries to use 'half=True' in 'TextToSpeech()'. Therefore, it doesn't matter here."

manmay-nakhashi commented 1 year ago

Prompting doesn't work well , with this model, it's better if you don't use it.

manmay-nakhashi commented 1 year ago

Anyway I'll look into this after sometime.

Chigoma333 commented 1 year ago

Thank you for your response. I looked into it myself and fixed the issue. (I'm mentioning this because I'm unsure if mentioning an issue will trigger a notification for you before you check.)

Bromancelot commented 11 months ago

Thank you for your response. I looked into it myself and fixed the issue. (I'm mentioning this because I'm unsure if mentioning an issue will trigger a notification for you before you check.)

how did you fix it?

Chigoma333 commented 11 months ago

look at my pull request https://github.com/neonbjb/tortoise-tts/pull/583