jishengpeng / WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
MIT License
830 stars 46 forks source link

Model can not converge #52

Closed VJJJJJJ1 closed 2 weeks ago

VJJJJJJ1 commented 2 weeks ago

hi, I try to train wavtokenizer on ljspeech and libritts respectively, but the loss seems can not converge. I did'n change any code, just change the following configuration:

data:
  class_path: decoder.dataset.VocosDataModule
  init_args:
    train_params:
      filelist_path: ./WavTokenizer-main/data/LJSPEECH/train.txt
      sampling_rate: 24000
      num_samples: 24000
      batch_size: 16
      num_workers: 8

    val_params:
      filelist_path: ./WavTokenizer-main/data/LJSPEECH/val.txt
      sampling_rate: 24000
      num_samples: 24000
      batch_size: 5 
      num_workers: 8

and can see loss as the pic, Is my configuration incorrect? Or still need to continue to wait? thanks for reply! image

jishengpeng commented 2 weeks ago

hi, I try to train wavtokenizer on ljspeech and libritts respectively, but the loss seems can not converge. I did'n change any code, just change the following configuration:

data:
  class_path: decoder.dataset.VocosDataModule
  init_args:
    train_params:
      filelist_path: ./WavTokenizer-main/data/LJSPEECH/train.txt
      sampling_rate: 24000
      num_samples: 24000
      batch_size: 16
      num_workers: 8

    val_params:
      filelist_path: ./WavTokenizer-main/data/LJSPEECH/val.txt
      sampling_rate: 24000
      num_samples: 24000
      batch_size: 5 
      num_workers: 8

and can see loss as the pic, Is my configuration incorrect? Or still need to continue to wait? thanks for reply! image

The TensorBoard visualization suggests that the training has stalled. I’ve encountered similar issues before, which are often due to problems with the model configuration or architecture. Based on the data configuration you provided, there doesn't appear to be any obvious issues. It might be helpful to start by training on the LibriTTS dataset and progressively debug from there. Typically, WavTokenizer converges quickly, and within a few hundred steps, the loss should already be in the double digits.

VJJJJJJ1 commented 2 weeks ago

The TensorBoard visualization suggests that the training has stalled. I’ve encountered similar issues before, which are often due to problems with the model configuration or architecture. Based on the data configuration you provided, there doesn't appear to be any obvious issues. It might be helpful to start by training on the LibriTTS dataset and progressively debug from there. Typically, WavTokenizer converges quickly, and within a few hundred steps, the loss should already be in the double digits.

I reduce half of the learning rate, and the model seems to start converging at the 900 steps. Thanks for reply!