Open Kaiburr21 opened 1 year ago
where you able to fix this?
where you able to fix this?
"@pb2806 3 hours ago Either you turn off BitsAndBytes in the Training configuration tab, or you make sure that the Torch, TorchAudio and TorchVision files in Runtime/Libs have the same version number as the CUDA installed on your computer. Jarod's files come with Torch files that have the version number 118, so I was having the Kernel error because of my CUDA version number 121." from YT
Can anyone pleas help me to solve this issue? I have this error and i cant find the solution. Is it possible that my 1060gtx 6GB also with 16GB of RAM is not enough to train voice modul from audio? I have installed CUDA 12.2 with Python3.10.6 on WIN 10. I have read a possible solution on another forum: """""""I don't know if the author solved the problem. But this is a gtx 1080ti video card problem It can be solved by strictly setting the version for torch==2.0.1 in setup-cuda.bat you also still need to change the version bitsandbytes==0.38.1 in modules/dlas/requirements.txt (at the bottom of the file)"""""""
even if it able to solve my problem i don't find the """"torch==2.0.1 in setup-cuda.bat""""""" file. I would be very grateful for any help and advice to make my training up and runing! :) CMD is down bellow:
Running on local URL: http://127.0.0.1:7860
To create a public link, set
share=True
inlaunch()
. Loading TorToiSe... (AR: None, diffusion: None, vocoder: bigvgan_24khz_100band) Hardware acceleration found: cuda use_deepspeed api_debug False C:\Users\danik\ai-voice-cloning\venv\lib\site-packages\torch\nn\utils\weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") Loading tokenizer JSON: C:\Users\danik\ai-voice-cloning\modules\tortoise-tts\tortoise../tortoise/data/tokenizer.json Loaded tokenizer Loading autoregressive model: C:\Users\danik\ai-voice-cloning\models\tortoise\autoregressive.pth Loaded autoregressive model Loaded diffusion model Loading vocoder model: bigvgan_24khz_100band Loading vocoder model: bigvgan_24khz_100band.pth Removing weight norm... Loaded vocoder model Loaded TTS, ready for generation. Unloaded TTS Loading specialized model for language: en Loading Whisper model: base.en Loading Whisper model: base.en Loaded Whisper model C:\Users\danik\ai-voice-cloning\venv\lib\site-packages\torchaudio\functional\functional.py:1371: UserWarning: "kaiser_window" resampling method name is being deprecated and replaced by "sinc_interp_kaiser" in the next release. The default behavior remains unchanged. warnings.warn( Text length too long (200 < 4870), using segments: estwood2.wav Unloaded Whisper Spawning process: train.bat ./training/estwood2/train.yaml [Training] [2023-10-22T23:34:46.362413] [Training] [2023-10-22T23:34:46.366424] (venv) C:\Users\danik\ai-voice-cloning>call .\venv\Scripts\activate.bat [Training] [2023-10-22T23:34:48.343417] [2023-10-22 23:34:48,343] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-10-22T23:34:50.973560] 23-10-22 23:34:50.973 - INFO: name: estwood2 [Training] [2023-10-22T23:34:50.977531] model: extensibletrainer [Training] [2023-10-22T23:34:50.981539] scale: 1 [Training] [2023-10-22T23:34:50.986506] gpu_ids: [0] [Training] [2023-10-22T23:34:50.990516] start_step: 0 [Training] [2023-10-22T23:34:50.993524] checkpointing_enabled: True [Training] [2023-10-22T23:34:50.997477] fp16: False [Training] [2023-10-22T23:34:51.000485] bitsandbytes: True [Training] [2023-10-22T23:34:51.004476] gpus: 1 [Training] [2023-10-22T23:34:51.007478] datasets:[ [Training] [2023-10-22T23:34:51.010462] train:[ [Training] [2023-10-22T23:34:51.013453] name: training [Training] [2023-10-22T23:34:51.017424] n_workers: 2 [Training] [2023-10-22T23:34:51.020417] batch_size: 66 [Training] [2023-10-22T23:34:51.024405] mode: paired_voice_audio [Training] [2023-10-22T23:34:51.029392] path: ./training/estwood2/train.txt [Training] [2023-10-22T23:34:51.033407] fetcher_mode: ['lj'] [Training] [2023-10-22T23:34:51.036412] phase: train [Training] [2023-10-22T23:34:51.040387] max_wav_length: 255995 [Training] [2023-10-22T23:34:51.043379] max_text_length: 200 [Training] [2023-10-22T23:34:51.046372] sample_rate: 22050 [Training] [2023-10-22T23:34:51.053359] load_conditioning: True [Training] [2023-10-22T23:34:51.057347] num_conditioning_candidates: 2 [Training] [2023-10-22T23:34:51.060340] conditioning_length: 44000 [Training] [2023-10-22T23:34:51.064299] use_bpe_tokenizer: True [Training] [2023-10-22T23:34:51.067315] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json [Training] [2023-10-22T23:34:51.071279] load_aligned_codes: False [Training] [2023-10-22T23:34:51.074272] data_type: img [Training] [2023-10-22T23:34:51.077264] ] [Training] [2023-10-22T23:34:51.080257] val:[ [Training] [2023-10-22T23:34:51.084270] name: validation [Training] [2023-10-22T23:34:51.087264] n_workers: 2 [Training] [2023-10-22T23:34:51.091226] batch_size: 2 [Training] [2023-10-22T23:34:51.094218] mode: paired_voice_audio [Training] [2023-10-22T23:34:51.098208] path: ./training/estwood2/validation.txt [Training] [2023-10-22T23:34:51.101200] fetcher_mode: ['lj'] [Training] [2023-10-22T23:34:51.104220] phase: val [Training] [2023-10-22T23:34:51.107216] max_wav_length: 255995 [Training] [2023-10-22T23:34:51.111202] max_text_length: 200 [Training] [2023-10-22T23:34:51.114194] sample_rate: 22050 [Training] [2023-10-22T23:34:51.117157] load_conditioning: True [Training] [2023-10-22T23:34:51.121176] num_conditioning_candidates: 2 [Training] [2023-10-22T23:34:51.125160] conditioning_length: 44000 [Training] [2023-10-22T23:34:51.128152] use_bpe_tokenizer: True [Training] [2023-10-22T23:34:51.131120] tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json [Training] [2023-10-22T23:34:51.134111] load_aligned_codes: False [Training] [2023-10-22T23:34:51.138102] data_type: img [Training] [2023-10-22T23:34:51.141093] ] [Training] [2023-10-22T23:34:51.144085] ] [Training] [2023-10-22T23:34:51.148074] steps:[ [Training] [2023-10-22T23:34:51.151096] gpt_train:[ [Training] [2023-10-22T23:34:51.154059] training: gpt [Training] [2023-10-22T23:34:51.157050] loss_log_buffer: 500 [Training] [2023-10-22T23:34:51.160042] optimizer: adamw [Training] [2023-10-22T23:34:51.164033] optimizer_params:[ [Training] [2023-10-22T23:34:51.167023] lr: 1e-05 [Training] [2023-10-22T23:34:51.170016] weight_decay: 0.01 [Training] [2023-10-22T23:34:51.173007] beta1: 0.9 [Training] [2023-10-22T23:34:51.176000] beta2: 0.96 [Training] [2023-10-22T23:34:51.178992] ] [Training] [2023-10-22T23:34:51.182981] clip_grad_eps: 4 [Training] [2023-10-22T23:34:51.185973] injectors:[ [Training] [2023-10-22T23:34:51.188965] paired_to_mel:[ [Training] [2023-10-22T23:34:51.192954] type: torch_mel_spectrogram [Training] [2023-10-22T23:34:51.195947] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth [Training] [2023-10-22T23:34:51.198939] in: wav [Training] [2023-10-22T23:34:51.202928] out: paired_mel [Training] [2023-10-22T23:34:51.205919] ] [Training] [2023-10-22T23:34:51.208912] paired_cond_to_mel:[ [Training] [2023-10-22T23:34:51.212901] type: for_each [Training] [2023-10-22T23:34:51.215893] subtype: torch_mel_spectrogram [Training] [2023-10-22T23:34:51.218885] mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth [Training] [2023-10-22T23:34:51.222874] in: conditioning [Training] [2023-10-22T23:34:51.225873] out: paired_conditioning_mel [Training] [2023-10-22T23:34:51.228858] ] [Training] [2023-10-22T23:34:51.232853] to_codes:[ [Training] [2023-10-22T23:34:51.235839] type: discrete_token [Training] [2023-10-22T23:34:51.238847] in: paired_mel [Training] [2023-10-22T23:34:51.241823] out: paired_mel_codes [Training] [2023-10-22T23:34:51.244815] dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml [Training] [2023-10-22T23:34:51.248805] ] [Training] [2023-10-22T23:34:51.251797] paired_fwd_text:[ [Training] [2023-10-22T23:34:51.255786] type: generator [Training] [2023-10-22T23:34:51.258778] generator: gpt [Training] [2023-10-22T23:34:51.261778] in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths'] [Training] [2023-10-22T23:34:51.265759] out: ['loss_text_ce', 'loss_mel_ce', 'logits'] [Training] [2023-10-22T23:34:51.269749] ] [Training] [2023-10-22T23:34:51.272744] ] [Training] [2023-10-22T23:34:51.275745] losses:[ [Training] [2023-10-22T23:34:51.278725] text_ce:[ [Training] [2023-10-22T23:34:51.281716] type: direct [Training] [2023-10-22T23:34:51.284708] weight: 0.01 [Training] [2023-10-22T23:34:51.287704] key: loss_text_ce [Training] [2023-10-22T23:34:51.290693] ] [Training] [2023-10-22T23:34:51.293685] mel_ce:[ [Training] [2023-10-22T23:34:51.297674] type: direct [Training] [2023-10-22T23:34:51.300666] weight: 1 [Training] [2023-10-22T23:34:51.304655] key: loss_mel_ce [Training] [2023-10-22T23:34:51.307647] ] [Training] [2023-10-22T23:34:51.310639] ] [Training] [2023-10-22T23:34:51.314629] ] [Training] [2023-10-22T23:34:51.317621] ] [Training] [2023-10-22T23:34:51.320638] networks:[ [Training] [2023-10-22T23:34:51.323629] gpt:[ [Training] [2023-10-22T23:34:51.327594] type: generator [Training] [2023-10-22T23:34:51.331609] which_model_G: unified_voice2 [Training] [2023-10-22T23:34:51.334606] kwargs:[ [Training] [2023-10-22T23:34:51.337567] layers: 30 [Training] [2023-10-22T23:34:51.341585] model_dim: 1024 [Training] [2023-10-22T23:34:51.344573] heads: 16 [Training] [2023-10-22T23:34:51.347565] max_text_tokens: 402 [Training] [2023-10-22T23:34:51.350533] max_mel_tokens: 604 [Training] [2023-10-22T23:34:51.353550] max_conditioning_inputs: 2 [Training] [2023-10-22T23:34:51.356541] mel_length_compression: 1024 [Training] [2023-10-22T23:34:51.360550] number_text_tokens: 256 [Training] [2023-10-22T23:34:51.363523] number_mel_codes: 8194 [Training] [2023-10-22T23:34:51.367487] start_mel_token: 8192 [Training] [2023-10-22T23:34:51.370503] stop_mel_token: 8193 [Training] [2023-10-22T23:34:51.373515] start_text_token: 255 [Training] [2023-10-22T23:34:51.376489] train_solo_embeddings: False [Training] [2023-10-22T23:34:51.379480] use_mel_codes_as_input: True [Training] [2023-10-22T23:34:51.383478] checkpointing: True [Training] [2023-10-22T23:34:51.386436] tortoise_compat: True [Training] [2023-10-22T23:34:51.389460] ] [Training] [2023-10-22T23:34:51.393450] ] [Training] [2023-10-22T23:34:51.396410] ] [Training] [2023-10-22T23:34:51.401397] path:[ [Training] [2023-10-22T23:34:51.404389] strict_load: True [Training] [2023-10-22T23:34:51.408402] pretrain_model_gpt: ./models/tortoise/autoregressive.pth [Training] [2023-10-22T23:34:51.411369] root: ./ [Training] [2023-10-22T23:34:51.415359] experiments_root: ./training\estwood2\finetune [Training] [2023-10-22T23:34:51.418359] models: ./training\estwood2\finetune\models [Training] [2023-10-22T23:34:51.421343] training_state: ./training\estwood2\finetune\training_state [Training] [2023-10-22T23:34:51.425363] log: ./training\estwood2\finetune [Training] [2023-10-22T23:34:51.428356] val_images: ./training\estwood2\finetune\val_images [Training] [2023-10-22T23:34:51.432313] ] [Training] [2023-10-22T23:34:51.435336] train:[ [Training] [2023-10-22T23:34:51.439320] niter: 500 [Training] [2023-10-22T23:34:51.442323] warmup_iter: -1 [Training] [2023-10-22T23:34:51.445303] mega_batch_factor: 33 [Training] [2023-10-22T23:34:51.448303] val_freq: 5 [Training] [2023-10-22T23:34:51.451287] ema_enabled: False [Training] [2023-10-22T23:34:51.454283] default_lr_scheme: MultiStepLR [Training] [2023-10-22T23:34:51.457272] gen_lr_steps: [2, 4, 9, 18, 25, 33, 50] [Training] [2023-10-22T23:34:51.460238] lr_gamma: 0.5 [Training] [2023-10-22T23:34:51.463256] ] [Training] [2023-10-22T23:34:51.466223] eval:[ [Training] [2023-10-22T23:34:51.469214] pure: False [Training] [2023-10-22T23:34:51.472207] output_state: gen [Training] [2023-10-22T23:34:51.475199] ] [Training] [2023-10-22T23:34:51.478190] logger:[ [Training] [2023-10-22T23:34:51.480185] save_checkpoint_freq: 5 [Training] [2023-10-22T23:34:51.483177] visuals: ['gen', 'mel'] [Training] [2023-10-22T23:34:51.486196] visual_debug_rate: 5 [Training] [2023-10-22T23:34:51.489186] is_mel_spectrogram: True [Training] [2023-10-22T23:34:51.492153] ] [Training] [2023-10-22T23:34:51.495145] is_train: True [Training] [2023-10-22T23:34:51.498138] dist: False [Training] [2023-10-22T23:34:51.501129] [Training] [2023-10-22T23:34:51.504146] 23-10-22 23:34:50.973 - INFO: Random seed: 7467 [Training] [2023-10-22T23:34:52.281889] 23-10-22 23:34:52.281 - INFO: Number of training data elements: 66, iters: 1 [Training] [2023-10-22T23:34:52.284854] 23-10-22 23:34:52.281 - INFO: Total epochs needed: 500 for iters 500 [Training] [2023-10-22T23:34:53.527997] C:\Users\danik\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passinggradient_checkpointing
to a config initialization is deprecated and will be removed in v5 Transformers. Usingmodel.gradient_checkpointing_enable()
instead, or if you are using theTrainer
API, passgradient_checkpointing=True
in yourTrainingArguments
. [Training] [2023-10-22T23:34:53.530990] warnings.warn( [Training] [2023-10-22T23:35:03.019533] 23-10-22 23:35:03.018 - INFO: Loading model for [./models/tortoise/autoregressive.pth] [Training] [2023-10-22T23:35:04.040195] 23-10-22 23:35:04.035 - INFO: Start training from epoch: 0, iter: 0 [Training] [2023-10-22T23:35:08.447475] [2023-10-22 23:35:08,447] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-10-22T23:35:10.417196] [2023-10-22 23:35:10,417] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs. [Training] [2023-10-22T23:35:11.938394] C:\Users\danik\ai-voice-cloning\venv\lib\site-packages\torch\optim\lr_scheduler.py:136: UserWarning: Detected call oflr_scheduler.step()
beforeoptimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order:optimizer.step()
beforelr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate [Training] [2023-10-22T23:35:11.938394] warnings.warn("Detected call oflr_scheduler.step()
beforeoptimizer.step()
. " [Training] [2023-10-22T23:35:13.403889] C:\Users\danik\ai-voice-cloning\venv\lib\site-packages\torch\utils\checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants. [Training] [2023-10-22T23:35:13.403889] warnings.warn( [Training] [2023-10-22T23:38:33.905473] Error no kernel image is available for execution on the device at line 167 in file D:\ai\tool\bitsandbytes\csrc[ops.cu](http://ops.cu/)