neonbjb / tortoise-tts

A multi-voice TTS system trained with an emphasis on quality
Apache License 2.0
13.16k stars 1.82k forks source link

Error no kernel image is available for execution on the device at line 167 in file D:\ai\tool\bitsandbytes\csrc\ops.cu #651

Open Kaiburr21 opened 1 year ago

Kaiburr21 commented 1 year ago

Can anyone pleas help me to solve this issue? I have this error and i cant find the solution. Is it possible that my 1060gtx 6GB also with 16GB of RAM is not enough to train voice modul from audio? I have installed CUDA 12.2 with Python3.10.6 on WIN 10. I have read a possible solution on another forum: """""""I don't know if the author solved the problem. But this is a gtx 1080ti video card problem It can be solved by strictly setting the version for torch==2.0.1 in setup-cuda.bat you also still need to change the version bitsandbytes==0.38.1 in modules/dlas/requirements.txt (at the bottom of the file)"""""""

even if it able to solve my problem i don't find the """"torch==2.0.1 in setup-cuda.bat""""""" file. I would be very grateful for any help and advice to make my training up and runing! :) CMD is down bellow:

Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch(). Loading TorToiSe... (AR: None, diffusion: None, vocoder: bigvgan_24khz_100band) Hardware acceleration found: cuda use_deepspeed api_debug False C:\Users\danik\ai-voice-cloning\venv\lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") Loading tokenizer JSON: C:\Users\danik\ai-voice-cloning\modules\tortoise-tts\tortoise../tortoise/data/tokenizer.json Loaded tokenizer Loading autoregressive model: C:\Users\danik\ai-voice-cloning\models\tortoise\autoregressive.pth Loaded autoregressive model Loaded diffusion model Loading vocoder model: bigvgan_24khz_100band Loading vocoder model: bigvgan_24khz_100band.pth Removing weight norm... Loaded vocoder model Loaded TTS, ready for generation. Unloaded TTS Loading specialized model for language: en Loading Whisper model: base.en Loading Whisper model: base.en Loaded Whisper model C:\Users\danik\ai-voice-cloning\venv\lib\site-packages\torchaudio\functional\functional.py:1371: UserWarning: "kaiser_window" resampling method name is being deprecated and replaced by "sinc_interp_kaiser" in the next release. The default behavior remains unchanged. warnings.warn( Text length too long (200 < 4870), using segments: estwood2.wav Unloaded Whisper Spawning process: train.bat ./training/estwood2/train.yaml [Training] [2023-10-22T23:34:46.362413] [Training] [2023-10-22T23:34:46.366424] (venv) C:\Users\danik\ai-voice-cloning>call .\venv\Scripts\activate.bat [Training] [2023-10-22T23:34:48.343417] [2023-10-22 23:34:48,343] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-10-22T23:34:50.973560] 23-10-22 23:34:50.973 - INFO: name: estwood2 [Training] [2023-10-22T23:34:50.977531] model: extensibletrainer [Training] [2023-10-22T23:34:50.981539] scale: 1 [Training] [2023-10-22T23:34:50.986506] gpu_ids: [0] [Training] [2023-10-22T23:34:50.990516] start_step: 0 [Training] [2023-10-22T23:34:50.993524] checkpointing_enabled: True [Training] [2023-10-22T23:34:50.997477] fp16: False [Training] [2023-10-22T23:34:51.000485] bitsandbytes: True [Training] [2023-10-22T23:34:51.004476] gpus: 1 [Training] [2023-10-22T23:34:51.007478] datasets:[ [Training] [2023-10-22T23:34:51.010462] train:[ [Training] [2023-10-22T23:34:51.013453] name: training [Training] [2023-10-22T23:34:51.017424] n_workers: 2 [Training] [2023-10-22T23:34:51.020417] batch_size: 66 [Training] [2023-10-22T23:34:51.024405] mode: paired_voice_audio [Training] [2023-10-22T23:34:51.029392] path: ./training/estwood2/train.txt [Training] [2023-10-22T23:34:51.033407] fetcher_mode: ['lj'] [Training] [2023-10-22T23:34:51.036412] phase: train [Training] [2023-10-22T23:34:51.040387] max_wav_length: 255995 [Training] [2023-10-22T23:34:51.043379] max_text_length: 200 [Training] [2023-10-22T23:34:51.046372] sample_rate: 22050 [Training] [2023-10-22T23:34:51.053359] load_conditioning: True [Training] [2023-10-22T23:34:51.057347] num_conditioning_candidates: 2 [Training] [2023-10-22T23:34:51.060340] conditioning_length: 44000 [Training] [2023-10-22T23:34:51.064299] use_bpe_tokenizer: True [Training] [2023-10-22T23:34:51.067315] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json [Training] [2023-10-22T23:34:51.071279] load_aligned_codes: False [Training] [2023-10-22T23:34:51.074272] data_type: img [Training] [2023-10-22T23:34:51.077264] ] [Training] [2023-10-22T23:34:51.080257] val:[ [Training] [2023-10-22T23:34:51.084270] name: validation [Training] [2023-10-22T23:34:51.087264] n_workers: 2 [Training] [2023-10-22T23:34:51.091226] batch_size: 2 [Training] [2023-10-22T23:34:51.094218] mode: paired_voice_audio [Training] [2023-10-22T23:34:51.098208] path: ./training/estwood2/validation.txt [Training] [2023-10-22T23:34:51.101200] fetcher_mode: ['lj'] [Training] [2023-10-22T23:34:51.104220] phase: val [Training] [2023-10-22T23:34:51.107216] max_wav_length: 255995 [Training] [2023-10-22T23:34:51.111202] max_text_length: 200 [Training] [2023-10-22T23:34:51.114194] sample_rate: 22050 [Training] [2023-10-22T23:34:51.117157] load_conditioning: True [Training] [2023-10-22T23:34:51.121176] num_conditioning_candidates: 2 [Training] [2023-10-22T23:34:51.125160] conditioning_length: 44000 [Training] [2023-10-22T23:34:51.128152] use_bpe_tokenizer: True [Training] [2023-10-22T23:34:51.131120] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json [Training] [2023-10-22T23:34:51.134111] load_aligned_codes: False [Training] [2023-10-22T23:34:51.138102] data_type: img [Training] [2023-10-22T23:34:51.141093] ] [Training] [2023-10-22T23:34:51.144085] ] [Training] [2023-10-22T23:34:51.148074] steps:[ [Training] [2023-10-22T23:34:51.151096] gpt_train:[ [Training] [2023-10-22T23:34:51.154059] training: gpt [Training] [2023-10-22T23:34:51.157050] loss_log_buffer: 500 [Training] [2023-10-22T23:34:51.160042] optimizer: adamw [Training] [2023-10-22T23:34:51.164033] optimizer_params:[ [Training] [2023-10-22T23:34:51.167023] lr: 1e-05 [Training] [2023-10-22T23:34:51.170016] weight_decay: 0.01 [Training] [2023-10-22T23:34:51.173007] beta1: 0.9 [Training] [2023-10-22T23:34:51.176000] beta2: 0.96 [Training] [2023-10-22T23:34:51.178992] ] [Training] [2023-10-22T23:34:51.182981] clip_grad_eps: 4 [Training] [2023-10-22T23:34:51.185973] injectors:[ [Training] [2023-10-22T23:34:51.188965] paired_to_mel:[ [Training] [2023-10-22T23:34:51.192954] type: torch_mel_spectrogram [Training] [2023-10-22T23:34:51.195947] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth [Training] [2023-10-22T23:34:51.198939] in: wav [Training] [2023-10-22T23:34:51.202928] out: paired_mel [Training] [2023-10-22T23:34:51.205919] ] [Training] [2023-10-22T23:34:51.208912] paired_cond_to_mel:[ [Training] [2023-10-22T23:34:51.212901] type: for_each [Training] [2023-10-22T23:34:51.215893] subtype: torch_mel_spectrogram [Training] [2023-10-22T23:34:51.218885] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth [Training] [2023-10-22T23:34:51.222874] in: conditioning [Training] [2023-10-22T23:34:51.225873] out: paired_conditioning_mel [Training] [2023-10-22T23:34:51.228858] ] [Training] [2023-10-22T23:34:51.232853] to_codes:[ [Training] [2023-10-22T23:34:51.235839] type: discrete_token [Training] [2023-10-22T23:34:51.238847] in: paired_mel [Training] [2023-10-22T23:34:51.241823] out: paired_mel_codes [Training] [2023-10-22T23:34:51.244815] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml [Training] [2023-10-22T23:34:51.248805] ] [Training] [2023-10-22T23:34:51.251797] paired_fwd_text:[ [Training] [2023-10-22T23:34:51.255786] type: generator [Training] [2023-10-22T23:34:51.258778] generator: gpt [Training] [2023-10-22T23:34:51.261778] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths'] [Training] [2023-10-22T23:34:51.265759] out: ['loss_text_ce', 'loss_mel_ce', 'logits'] [Training] [2023-10-22T23:34:51.269749] ] [Training] [2023-10-22T23:34:51.272744] ] [Training] [2023-10-22T23:34:51.275745] losses:[ [Training] [2023-10-22T23:34:51.278725] text_ce:[ [Training] [2023-10-22T23:34:51.281716] type: direct [Training] [2023-10-22T23:34:51.284708] weight: 0.01 [Training] [2023-10-22T23:34:51.287704] key: loss_text_ce [Training] [2023-10-22T23:34:51.290693] ] [Training] [2023-10-22T23:34:51.293685] mel_ce:[ [Training] [2023-10-22T23:34:51.297674] type: direct [Training] [2023-10-22T23:34:51.300666] weight: 1 [Training] [2023-10-22T23:34:51.304655] key: loss_mel_ce [Training] [2023-10-22T23:34:51.307647] ] [Training] [2023-10-22T23:34:51.310639] ] [Training] [2023-10-22T23:34:51.314629] ] [Training] [2023-10-22T23:34:51.317621] ] [Training] [2023-10-22T23:34:51.320638] networks:[ [Training] [2023-10-22T23:34:51.323629] gpt:[ [Training] [2023-10-22T23:34:51.327594] type: generator [Training] [2023-10-22T23:34:51.331609] which_model_G: unified_voice2 [Training] [2023-10-22T23:34:51.334606] kwargs:[ [Training] [2023-10-22T23:34:51.337567] layers: 30 [Training] [2023-10-22T23:34:51.341585] model_dim: 1024 [Training] [2023-10-22T23:34:51.344573] heads: 16 [Training] [2023-10-22T23:34:51.347565] max_text_tokens: 402 [Training] [2023-10-22T23:34:51.350533] max_mel_tokens: 604 [Training] [2023-10-22T23:34:51.353550] max_conditioning_inputs: 2 [Training] [2023-10-22T23:34:51.356541] mel_length_compression: 1024 [Training] [2023-10-22T23:34:51.360550] number_text_tokens: 256 [Training] [2023-10-22T23:34:51.363523] number_mel_codes: 8194 [Training] [2023-10-22T23:34:51.367487] start_mel_token: 8192 [Training] [2023-10-22T23:34:51.370503] stop_mel_token: 8193 [Training] [2023-10-22T23:34:51.373515] start_text_token: 255 [Training] [2023-10-22T23:34:51.376489] train_solo_embeddings: False [Training] [2023-10-22T23:34:51.379480] use_mel_codes_as_input: True [Training] [2023-10-22T23:34:51.383478] checkpointing: True [Training] [2023-10-22T23:34:51.386436] tortoise_compat: True [Training] [2023-10-22T23:34:51.389460] ] [Training] [2023-10-22T23:34:51.393450] ] [Training] [2023-10-22T23:34:51.396410] ] [Training] [2023-10-22T23:34:51.401397] path:[ [Training] [2023-10-22T23:34:51.404389] strict_load: True [Training] [2023-10-22T23:34:51.408402] pretrain_model_gpt: ./models/tortoise/autoregressive.pth [Training] [2023-10-22T23:34:51.411369] root: ./ [Training] [2023-10-22T23:34:51.415359] experiments_root: ./training\estwood2\finetune [Training] [2023-10-22T23:34:51.418359] models: ./training\estwood2\finetune\models [Training] [2023-10-22T23:34:51.421343] training_state: ./training\estwood2\finetune\training_state [Training] [2023-10-22T23:34:51.425363] log: ./training\estwood2\finetune [Training] [2023-10-22T23:34:51.428356] val_images: ./training\estwood2\finetune\val_images [Training] [2023-10-22T23:34:51.432313] ] [Training] [2023-10-22T23:34:51.435336] train:[ [Training] [2023-10-22T23:34:51.439320] niter: 500 [Training] [2023-10-22T23:34:51.442323] warmup_iter: -1 [Training] [2023-10-22T23:34:51.445303] mega_batch_factor: 33 [Training] [2023-10-22T23:34:51.448303] val_freq: 5 [Training] [2023-10-22T23:34:51.451287] ema_enabled: False [Training] [2023-10-22T23:34:51.454283] default_lr_scheme: MultiStepLR [Training] [2023-10-22T23:34:51.457272] gen_lr_steps: [2, 4, 9, 18, 25, 33, 50] [Training] [2023-10-22T23:34:51.460238] lr_gamma: 0.5 [Training] [2023-10-22T23:34:51.463256] ] [Training] [2023-10-22T23:34:51.466223] eval:[ [Training] [2023-10-22T23:34:51.469214] pure: False [Training] [2023-10-22T23:34:51.472207] output_state: gen [Training] [2023-10-22T23:34:51.475199] ] [Training] [2023-10-22T23:34:51.478190] logger:[ [Training] [2023-10-22T23:34:51.480185] save_checkpoint_freq: 5 [Training] [2023-10-22T23:34:51.483177] visuals: ['gen', 'mel'] [Training] [2023-10-22T23:34:51.486196] visual_debug_rate: 5 [Training] [2023-10-22T23:34:51.489186] is_mel_spectrogram: True [Training] [2023-10-22T23:34:51.492153] ] [Training] [2023-10-22T23:34:51.495145] is_train: True [Training] [2023-10-22T23:34:51.498138] dist: False [Training] [2023-10-22T23:34:51.501129] [Training] [2023-10-22T23:34:51.504146] 23-10-22 23:34:50.973 - INFO: Random seed: 7467 [Training] [2023-10-22T23:34:52.281889] 23-10-22 23:34:52.281 - INFO: Number of training data elements: 66, iters: 1 [Training] [2023-10-22T23:34:52.284854] 23-10-22 23:34:52.281 - INFO: Total epochs needed: 500 for iters 500 [Training] [2023-10-22T23:34:53.527997] C:\Users\danik\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing gradient_checkpointing to a config initialization is deprecated and will be removed in v5 Transformers. Using model.gradient_checkpointing_enable() instead, or if you are using the Trainer API, pass gradient_checkpointing=True in your TrainingArguments. [Training] [2023-10-22T23:34:53.530990] warnings.warn( [Training] [2023-10-22T23:35:03.019533] 23-10-22 23:35:03.018 - INFO: Loading model for [./models/tortoise/autoregressive.pth] [Training] [2023-10-22T23:35:04.040195] 23-10-22 23:35:04.035 - INFO: Start training from epoch: 0, iter: 0 [Training] [2023-10-22T23:35:08.447475] [2023-10-22 23:35:08,447] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-10-22T23:35:10.417196] [2023-10-22 23:35:10,417] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-10-22T23:35:11.938394] C:\Users\danik\ai-voice-cloning\venv\lib\site-packages\torch\optim\lr_scheduler.py:136: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate [Training] [2023-10-22T23:35:11.938394] warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). " [Training] [2023-10-22T23:35:13.403889] C:\Users\danik\ai-voice-cloning\venv\lib\site-packages\torch\utils\checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. [Training] [2023-10-22T23:35:13.403889] warnings.warn( [Training] [2023-10-22T23:38:33.905473] Error no kernel image is available for execution on the device at line 167 in file D:\ai\tool\bitsandbytes\csrc[ops.cu](http://ops.cu/)

TwoThirdsBand commented 1 year ago

where you able to fix this?

HarryBols commented 9 months ago

where you able to fix this?

"@pb2806 3 hours ago Either you turn off BitsAndBytes in the Training configuration tab, or you make sure that the Torch, TorchAudio and TorchVision files in Runtime/Libs have the same version number as the CUDA installed on your computer. Jarod's files come with Torch files that have the version number 118, so I was having the Kernel error because of my CUDA version number 121." from YT