Open lucellent opened 2 months ago
You can increase the number of frequency bands from 3 to 4, but the hyperparameters might become more complex. Alternatively, you can directly increase the channel dimension.
You can increase the number of frequency bands from 3 to 4, but the hyperparameters might become more complex. Alternatively, you can directly increase the channel dimension.
I'm using zfturbo's script and config, does that mean:
conv_depths: -4
Or something else? I already tried
compress: 2 conv_kernel: 5 num_dplayer: 8 expand: 1
but I think it was too much for a 3090 GPU
num_dplayer: 8 is acceptable, but you need to reduce the audio length or batch size during training.
Okay got it, thank you. I tried training from existing large checkpoint with the larger config but seems like it might be better to train a whole new model with the new config, then finetune with better dataset
Zfturbo mentioned this is what worked for him. Also let me know if there are other parameters I can adjust to improve the model or simply num dplayer is enough (I will try to double it, from 6 to 12)
Is it possible to increase model size even beyond "large"? For example by adding a new 512 band? Or if not, what are other strategies to maximise the possible model size