lucidrains / audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch
MIT License
2.45k stars 266 forks source link

Trying to overfit SounsStream #276

Open hishammadcor opened 3 months ago

hishammadcor commented 3 months ago

I am trying to overfit SoundStream on 10 samples from a common voice dataset to validate and train on the whole dataset. But the output is just noise! I do not know what the problem is. I also tried to overfit on only 1 sample.

This is the code I use:

soundstream = SoundStream(
    codebook_size = 1024,
    rq_num_quantizers = 8,
    rq_groups = 2,                      
    use_lookup_free_quantizer = True,    
    use_finite_scalar_quantizer = False, 
    attn_window_size = 128,             
    attn_depth = 2
)

trainer = SoundStreamTrainer(
    soundstream,
    folder = "/data/data/",
    batch_size = 1,
    grad_accum_every = 2,        
    data_max_length_seconds = 2, 
    save_results_every = 1,
    save_model_every = 4,
    num_train_steps = 1_000
).cuda()

trainer.train()

The text file for the losses during the training for the first 500+ steps:\ audiolm_pytorch_demo.txt

I am using the last version 2.0.7, Python 3.10.4.