I'm trying to train an Austrian TTS model with Vits, but despite trying various configurations, I haven't been able to start the training properly. It worked once with 5% of my data (around 7.5 hours) using a batch size of 2, but I suspect this isn't sufficient for good output quality.
# define model config
config = VitsConfig(
batch_size=16,
eval_batch_size=8,
batch_group_size=1,
num_loader_workers=0,
num_eval_loader_workers=32,
run_eval=True,
test_delay_epochs=-1,
epochs=1000,
text_cleaner="basic_german_cleaners",
use_phonemes=True,
phoneme_language="de",
phoneme_cache_path=os.path.join(output_path, "phoneme_cache_tts"),
compute_input_seq_cache=True,
precompute_num_workers=12,
print_step=20,
print_eval=True,
mixed_precision=True,
output_path=output_path,
datasets=[dataset_config],
use_speaker_embedding=True,
test_sentences=[
"Hallo, wie geht es dir? Ich hoffe, du hast einen schönen Tag.",
"Das ist ein Test. Wir überprüfen, ob alles wie erwartet funktioniert.",
"Ich lerne gerade Programmierung. Es ist eine sehr nützliche Fähigkeit, die viele Türen Öffnen kann.",
"Die Sonne scheint heute. Es ist ein perfekter Tag, um draußen spazieren zu gehen und die Natur zu genießen.",
"Ich mag Schokolade. Besonders dunkle Schokolade mit einem hohen Kakaoanteil ist mein Favorit."
],
cudnn_enable=True,
cudnn_benchmark=True,
cudnn_deterministic=True
)
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 23.64 GiB of which 16.50 MiB is free. Including non-PyTorch memory, this process has 23.61 GiB memory in use. Of the allocated memory 22.53 GiB is allocated by PyTorch, and 482.79 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Describe the bug
I'm trying to train an Austrian TTS model with Vits, but despite trying various configurations, I haven't been able to start the training properly. It worked once with 5% of my data (around 7.5 hours) using a batch size of 2, but I suspect this isn't sufficient for good output quality.
Any suggestions for improvement?
To Reproduce
CUDA_VISIBLE_DEVICES="0, 1, 2" python -m trainer.distribute --script train.py
Expected behavior
No response
Logs
No response
Environment
Additional context
No response