lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.32k stars 249 forks source link

Soundstream training using birdsongs. Any guidance appreciated! #270

Open haydensflee opened 4 months ago

haydensflee commented 4 months ago

Hello, I've been trying to run AudioLM and training it using birdsongs to try and see if it can produce good-quality synthetically generated audio data. I've been using bird songs from the xeno-canto dataset https://www.kaggle.com/datasets/rohanrao/xeno-canto-bird-recordings-extended-a-m that I've preprocessed by converting to single-channel at 22050Hz sample rate and trimming each recording down to 3 seconds. I'm trying to train the soundstream now. At the start it's just noise but I'm still not getting anything after about 20000 steps. I've read #54 to get some advice. Should my sample_stepcount.ema.flac sound like birdsong when it's properly trained?

Using an A6000 GPU as well.

Thanks