neonbjb / tortoise-tts

A multi-voice TTS system trained with an emphasis on quality
Apache License 2.0
12.42k stars 1.74k forks source link

Hello, I can't train the model, does anyone have a solution to this? #699

Open instak1ll opened 7 months ago

instak1ll commented 7 months ago

NOTE: Redirects are currently not supported in Windows or MacOs. I'm using Windows 10, could that be the problem?


[Training] [2023-12-04T18:01:54.444672]   model: extensibletrainer
[Training] [2023-12-04T18:01:54.447675]   scale: 1
[Training] [2023-12-04T18:01:54.453681]   gpu_ids: [0]
[Training] [2023-12-04T18:01:54.456683]   start_step: 0
[Training] [2023-12-04T18:01:54.460686]   checkpointing_enabled: True
[Training] [2023-12-04T18:01:54.464691]   fp16: False
[Training] [2023-12-04T18:01:54.469696]   bitsandbytes: True
[Training] [2023-12-04T18:01:54.472698]   gpus: 1
[Training] [2023-12-04T18:01:54.477702]   datasets:[
[Training] [2023-12-04T18:01:54.481706]     train:[
[Training] [2023-12-04T18:01:54.485709]       name: training
[Training] [2023-12-04T18:01:54.490714]       n_workers: 2
[Training] [2023-12-04T18:01:54.494719]       batch_size: 13
[Training] [2023-12-04T18:01:54.498722]       mode: paired_voice_audio
[Training] [2023-12-04T18:01:54.501727]       path: ./training/2/train.txt
[Training] [2023-12-04T18:01:54.505729]       fetcher_mode: ['lj']
[Training] [2023-12-04T18:01:54.509732]       phase: train
[Training] [2023-12-04T18:01:54.513735]       max_wav_length: 255995
[Training] [2023-12-04T18:01:54.517739]       max_text_length: 200
[Training] [2023-12-04T18:01:54.520742]       sample_rate: 22050
[Training] [2023-12-04T18:01:54.524745]       load_conditioning: True
[Training] [2023-12-04T18:01:54.527749]       num_conditioning_candidates: 2
[Training] [2023-12-04T18:01:54.532753]       conditioning_length: 44000
[Training] [2023-12-04T18:01:54.535756]       use_bpe_tokenizer: True
[Training] [2023-12-04T18:01:54.540760]       tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-12-04T18:01:54.544764]       load_aligned_codes: False
[Training] [2023-12-04T18:01:54.547766]       data_type: img
[Training] [2023-12-04T18:01:54.551770]     ]
[Training] [2023-12-04T18:01:54.555775]     val:[
[Training] [2023-12-04T18:01:54.558777]       name: validation
[Training] [2023-12-04T18:01:54.562781]       n_workers: 2
[Training] [2023-12-04T18:01:54.566784]       batch_size: 0
[Training] [2023-12-04T18:01:54.570788]       mode: paired_voice_audio
[Training] [2023-12-04T18:01:54.574791]       path: ./training/2/validation.txt
[Training] [2023-12-04T18:01:54.578795]       fetcher_mode: ['lj']
[Training] [2023-12-04T18:01:54.583801]       phase: val
[Training] [2023-12-04T18:01:54.587802]       max_wav_length: 255995
[Training] [2023-12-04T18:01:54.590806]       max_text_length: 200
[Training] [2023-12-04T18:01:54.594809]       sample_rate: 22050
[Training] [2023-12-04T18:01:54.598813]       load_conditioning: True
[Training] [2023-12-04T18:01:54.602818]       num_conditioning_candidates: 2
[Training] [2023-12-04T18:01:54.605819]       conditioning_length: 44000
[Training] [2023-12-04T18:01:54.610825]       use_bpe_tokenizer: True
[Training] [2023-12-04T18:01:54.613826]       tokenizer_vocab: ./modules/tortoise-tts/tortoise/data/tokenizer.json
[Training] [2023-12-04T18:01:54.617830]       load_aligned_codes: False
[Training] [2023-12-04T18:01:54.621833]       data_type: img
[Training] [2023-12-04T18:01:54.624836]     ]
[Training] [2023-12-04T18:01:54.628840]   ]
[Training] [2023-12-04T18:01:54.631843]   steps:[
[Training] [2023-12-04T18:01:54.636847]     gpt_train:[
[Training] [2023-12-04T18:01:54.639850]       training: gpt
[Training] [2023-12-04T18:01:54.643855]       loss_log_buffer: 500
[Training] [2023-12-04T18:01:54.647858]       optimizer: adamw
[Training] [2023-12-04T18:01:54.651863]       optimizer_params:[
[Training] [2023-12-04T18:01:54.654865]         lr: 1e-05
[Training] [2023-12-04T18:01:54.657867]         weight_decay: 0.01
[Training] [2023-12-04T18:01:54.660870]         beta1: 0.9
[Training] [2023-12-04T18:01:54.665874]         beta2: 0.96
[Training] [2023-12-04T18:01:54.669877]       ]
[Training] [2023-12-04T18:01:54.673881]       clip_grad_eps: 4
[Training] [2023-12-04T18:01:54.677884]       injectors:[
[Training] [2023-12-04T18:01:54.682889]         paired_to_mel:[
[Training] [2023-12-04T18:01:54.686893]           type: torch_mel_spectrogram
[Training] [2023-12-04T18:01:54.691898]           mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-12-04T18:01:54.694900]           in: wav
[Training] [2023-12-04T18:01:54.698904]           out: paired_mel
[Training] [2023-12-04T18:01:54.702907]         ]
[Training] [2023-12-04T18:01:54.706911]         paired_cond_to_mel:[
[Training] [2023-12-04T18:01:54.710914]           type: for_each
[Training] [2023-12-04T18:01:54.714919]           subtype: torch_mel_spectrogram
[Training] [2023-12-04T18:01:54.719923]           mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2023-12-04T18:01:54.722926]           in: conditioning
[Training] [2023-12-04T18:01:54.726929]           out: paired_conditioning_mel
[Training] [2023-12-04T18:01:54.729932]         ]
[Training] [2023-12-04T18:01:54.734937]         to_codes:[
[Training] [2023-12-04T18:01:54.737938]           type: discrete_token
[Training] [2023-12-04T18:01:54.742943]           in: paired_mel
[Training] [2023-12-04T18:01:54.746947]           out: paired_mel_codes
[Training] [2023-12-04T18:01:54.749950]           dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml
[Training] [2023-12-04T18:01:54.753954]         ]
[Training] [2023-12-04T18:01:54.756956]         paired_fwd_text:[
[Training] [2023-12-04T18:01:54.759959]           type: generator
[Training] [2023-12-04T18:01:54.763963]           generator: gpt
[Training] [2023-12-04T18:01:54.766965]           in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths'] 
[Training] [2023-12-04T18:01:54.770969]           out: ['loss_text_ce', 'loss_mel_ce', 'logits']
[Training] [2023-12-04T18:01:54.773972]         ]
[Training] [2023-12-04T18:01:54.779977]       ]
[Training] [2023-12-04T18:01:54.783981]       losses:[
[Training] [2023-12-04T18:01:54.789986]         text_ce:[
[Training] [2023-12-04T18:01:54.794991]           type: direct
[Training] [2023-12-04T18:01:54.798994]           weight: 0.01
[Training] [2023-12-04T18:01:54.802998]           key: loss_text_ce
[Training] [2023-12-04T18:01:54.807001]         ]
[Training] [2023-12-04T18:01:54.810004]         mel_ce:[
[Training] [2023-12-04T18:01:54.813007]           type: direct
[Training] [2023-12-04T18:01:54.817011]           weight: 1
[Training] [2023-12-04T18:01:54.820013]           key: loss_mel_ce
[Training] [2023-12-04T18:01:54.824017]         ]
[Training] [2023-12-04T18:01:54.828020]       ]
[Training] [2023-12-04T18:01:54.833025]     ]
[Training] [2023-12-04T18:01:54.837029]   ]
[Training] [2023-12-04T18:01:54.841032]   networks:[
[Training] [2023-12-04T18:01:54.845036]     gpt:[
[Training] [2023-12-04T18:01:54.848039]       type: generator
[Training] [2023-12-04T18:01:54.852043]       which_model_G: unified_voice2
[Training] [2023-12-04T18:01:54.856046]       kwargs:[
[Training] [2023-12-04T18:01:54.859050]         layers: 30
[Training] [2023-12-04T18:01:54.863053]         model_dim: 1024
[Training] [2023-12-04T18:01:54.867057]         heads: 16
[Training] [2023-12-04T18:01:54.870059]         max_text_tokens: 402
[Training] [2023-12-04T18:01:54.874062]         max_mel_tokens: 604
[Training] [2023-12-04T18:01:54.879067]         max_conditioning_inputs: 2
[Training] [2023-12-04T18:01:54.882071]         mel_length_compression: 1024
[Training] [2023-12-04T18:01:54.885073]         number_text_tokens: 256
[Training] [2023-12-04T18:01:54.889077]         number_mel_codes: 8194
[Training] [2023-12-04T18:01:54.893080]         start_mel_token: 8192
[Training] [2023-12-04T18:01:54.896082]         stop_mel_token: 8193
[Training] [2023-12-04T18:01:54.901087]         start_text_token: 255
[Training] [2023-12-04T18:01:54.906092]         train_solo_embeddings: False
[Training] [2023-12-04T18:01:54.909095]         use_mel_codes_as_input: True
[Training] [2023-12-04T18:01:54.912098]         checkpointing: True
[Training] [2023-12-04T18:01:54.918104]         tortoise_compat: True
[Training] [2023-12-04T18:01:54.922108]       ]
[Training] [2023-12-04T18:01:54.925109]     ]
[Training] [2023-12-04T18:01:54.929112]   ]
[Training] [2023-12-04T18:01:54.932115]   path:[
[Training] [2023-12-04T18:01:54.937120]     strict_load: True
[Training] [2023-12-04T18:01:54.940122]     pretrain_model_gpt: ./models/tortoise/autoregressive.pth
[Training] [2023-12-04T18:01:54.944126]     root: ./
[Training] [2023-12-04T18:01:54.947129]     experiments_root: ./training\2\finetune
[Training] [2023-12-04T18:01:54.951132]     models: ./training\2\finetune\models
[Training] [2023-12-04T18:01:54.955136]     training_state: ./training\2\finetune\training_state
[Training] [2023-12-04T18:01:54.959140]     log: ./training\2\finetune
[Training] [2023-12-04T18:01:54.963143]     val_images: ./training\2\finetune\val_images
[Training] [2023-12-04T18:01:54.966147]   ]
[Training] [2023-12-04T18:01:54.970150]   train:[
[Training] [2023-12-04T18:01:54.973153]     niter: 200
[Training] [2023-12-04T18:01:54.977156]     warmup_iter: -1
[Training] [2023-12-04T18:01:54.980158]     mega_batch_factor: 3
[Training] [2023-12-04T18:01:54.984163]     val_freq: 50
[Training] [2023-12-04T18:01:54.987166]     ema_enabled: False
[Training] [2023-12-04T18:01:54.991170]     default_lr_scheme: MultiStepLR
[Training] [2023-12-04T18:01:54.994172]     gen_lr_steps: [2, 4, 9, 18, 25, 33, 50]
[Training] [2023-12-04T18:01:54.998176]     lr_gamma: 0.5
[Training] [2023-12-04T18:01:55.001179]   ]
[Training] [2023-12-04T18:01:55.005182]   eval:[
[Training] [2023-12-04T18:01:55.008185]     pure: False
[Training] [2023-12-04T18:01:55.013189]     output_state: gen
[Training] [2023-12-04T18:01:55.017193]   ]
[Training] [2023-12-04T18:01:55.020195]   logger:[
[Training] [2023-12-04T18:01:55.024199]     save_checkpoint_freq: 50
[Training] [2023-12-04T18:01:55.028202]     visuals: ['gen', 'mel']
[Training] [2023-12-04T18:01:55.033207]     visual_debug_rate: 50
[Training] [2023-12-04T18:01:55.036211]     is_mel_spectrogram: True
[Training] [2023-12-04T18:01:55.040214]   ]
[Training] [2023-12-04T18:01:55.045218]   is_train: True
[Training] [2023-12-04T18:01:55.050222]   dist: False
[Training] [2023-12-04T18:01:55.055227]
[Training] [2023-12-04T18:01:55.061233] 23-12-04 18:01:54.440 - INFO: Random seed: 1464
[Training] [2023-12-04T18:01:56.224289] 23-12-04 18:01:56.224 - INFO: Number of training data elements: 6, iters: 1
[Training] [2023-12-04T18:01:56.229293] 23-12-04 18:01:56.224 - INFO: Total epochs needed: 200 for iters 200
[Training] [2023-12-04T18:01:57.315203] D:\RVC\TORTOISE TTS\ai-voice-cloning\venv\lib\site-packages\transformers\configuration_utils.py:363: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
[Training] [2023-12-04T18:01:57.320208]   warnings.warn(
[Training] [2023-12-04T18:02:07.500687] 23-12-04 18:02:07.500 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
[Training] [2023-12-04T18:02:12.495284] 23-12-04 18:02:12.400 - INFO: Start training from epoch: 0, iter: 0
[Training] [2023-12-04T18:02:15.257289] [2023-12-04 18:02:15,257] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-12-04T18:02:15.278307] [2023-12-04 18:02:15,278] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2023-12-04T18:02:23.795688] 23-12-04 18:02:23.795 - INFO: Saving models and training states.
[Training] [2023-12-04T18:02:23.797690] 23-12-04 18:02:23.795 - INFO: Finished training!```
genewitch commented 7 months ago

if you're on windows there's an issue with the way the caching folders are handled. you have to either enable developer mode on windows or run python as administrator, i'd try that first (seeing as how it mentions windows specifically in the warning).

cesinsingapore commented 4 months ago

how do you train it? is there any documentation ?