lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.36k stars 255 forks source link

Asking for some SoundStream training info #106

Closed amitaie closed 1 year ago

amitaie commented 1 year ago

Hey, I want to train soundstream and as i understood from some issues here some people manged to do so. I have some questions on the training procedure, hopefully they are not dummy:

  1. What dataset to use in order to get results that indicate that the training is working well? i saw some people choose LibriSpeech (which is in 16Khz, right?) why not libri-light? Libri-tts?
  2. in respect to 1 - how many steps did the training take till it gets to "not only noise"? and how much setps till it sounds well?
  3. how long takes each step? (or how many steps you run in a 24hrs).
  4. Does anybody fill like sharing some tensorbords graphs so i will have an idea if the training is in the right direction?

I tried to go over most of the issues before asking those questions, hope I didn't miss answers to this. Thanks in advanced,

lucidrains commented 1 year ago

could you move this to the discussions, as it is technically not an issue?