I trained with LibriTTS 100 and LibriTTS 360 for 1.5 days and the model still can't output audible speech. I'm not sure if it's because of the small dataset size or that I used the code in a wrong way. My quesiton is, what is the smallest dataset size that can train a working AudioLM model that produces audible speech, either unconditionally or conditioned on something?
I trained with LibriTTS 100 and LibriTTS 360 for 1.5 days and the model still can't output audible speech. I'm not sure if it's because of the small dataset size or that I used the code in a wrong way. My quesiton is, what is the smallest dataset size that can train a working AudioLM model that produces audible speech, either unconditionally or conditioned on something?