yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
MIT License
4.97k stars 419 forks source link

Can anyone please share checkpoints that we get after we complete both stages of training #268

Open tanishbajaj101 opened 4 months ago

tanishbajaj101 commented 4 months ago

I am running out of memory when I train on LJSpeech dataset. I want to finetune upon LJSpeech itself but I'm failing in the training stage itself as either I would run out memory, or would have to significantly restrict my max_len implying I wouldn't be able to generate longer duration clips. So can anyone please share their checkpoints made on training stage, so I can finetune on a relatively smaller dataset. Also, will I be able to succesfully finetune on checkpoints I receive from internet? (i'm new to this)

Tristan-Hopkins commented 4 months ago

https://huggingface.co/yl4579/StyleTTS2-LJSpeech/tree/main https://huggingface.co/yl4579/StyleTTS2-LibriTTS/tree/main You Can use their models for finetuning. I this what you were looking for?

tanishbajaj101 commented 4 months ago

https://huggingface.co/yl4579/StyleTTS2-LJSpeech/tree/main https://huggingface.co/yl4579/StyleTTS2-LibriTTS/tree/main You Can use their models for finetuning. I this what you were looking for?

yes, thanks a lot! can you tell me which of these does the StyleTTS2 uses by default? I explored it using the WebUI and python package prior to building myself

78Alpha commented 4 months ago

LibriTTS should be the default. Although LJ should be better.

martinambrus commented 2 months ago

@tanishbajaj101 could you please close the issue if it's resolved to clear things up please?