Open gogobd opened 4 years ago
I have been using these parameters. I don't know if they are right, but I have started to get a pretty decent output.
mpiexec -n 3 python jukebox/train.py --hps=vqvae,small_prior,all_fp16,cpu_ema --name=pretrained_vqvae_small_prior --sample_length=1048576 --bs=4 --aug_shift --aug_blend --audio_files_dir=/home/vertigo/jukebox/learning2 --labels=False --train --test --prior --levels=3 --level=2 --weight_decay=0.01 --save_iters=1000 --restore_prior=/home/vertigo/jukebox/logs/pretrained_vqvae_small_prior/checkpoint_latest.pth.tar --lr_use_linear_decay --lr_start_linear_decay=0 --lr_decay=0.9
--lr_decay=0.9
@ObscuraDK This means you are only invoking the lr decay for the last fraction of a step during training (ie, this is essentially doing nothing unless you are working with an absolutely massive dataset). 1 step = 1/x iteration during training. I can't speak to what the effective number of steps is given I don't know what you are training on, but I am finding that with small datasets the lr is probably too high by default and the decay maybe should persist for the entire duration of training (but I have yet to test this). You might find this link helpful to better understand whats going on here: https://www.jeremyjordan.me/nn-learning-rate/
we are making prior level 2 training using Colab. We have a group on discord called What kind of special set up we need to do to dataset for lyrics and non lyrics training? https://discord.com/invite/6At7WwM
I was trying to anneal / cool off the learning rate following the examples to build models from scratch but when I try
but I get
I don't know which other file I would have to restore (I'm using 'logs/small_prior/checkpoint_latest.pth.tar') - maybe someone can help me with this?