What's the smallest dataset size that can train a working model with audible speech

lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

MIT License

2.36k stars 255 forks source link

What's the smallest dataset size that can train a working model with audible speech #153

Open zhouyong64 opened 1 year ago

zhouyong64 commented 1 year ago

I trained with LibriTTS 100 and LibriTTS 360 for 1.5 days and the model still can't output audible speech. I'm not sure if it's because of the small dataset size or that I used the code in a wrong way. My quesiton is, what is the smallest dataset size that can train a working AudioLM model that produces audible speech, either unconditionally or conditioned on something?

keshawnhsieh commented 1 year ago

Could you share your training hardware setups? I am planning to start training these days and not sure how long it'll take with 4 V100 Gpus.