lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.32k stars 249 forks source link

Model cascade training #272

Open a897456 opened 3 months ago

a897456 commented 3 months ago

Hi @lucidrains When model cascade training occurs, do you train the SoundStream first and proceed to train the second model by using soundstream.init_and_load_from ('./path/to/checkpoint.pt')?

incorrect demonstration

from audiolm_pytorch import SoundStream from naturalspeech2_pytorch import NaturalSpeech2 codec = SoundStream() model = NaturalSpeech2(codec = codec, timesteps = 1000) trainer = Trainer(diffusion_model = model) trainer.train()

Correct demonstration

from naturalspeech2_pytorch import NaturalSpeech2 codec = SoundStream.init_and_load_from('./path/to/checkpoint.pt') model = NaturalSpeech2(codec = codec, timesteps = 1000) trainer = Trainer(diffusion_model = model) trainer.train() right?