Open FlyToYourMooN opened 8 months ago
Hi @FlyToYourMooN, thanks for asking.
Could you provide some generated audio samples?
The loss_T
looks normal, but loss
(the negative ELBO) looks higher than usual (in my experiments the loss should be around -5.6 at 240k).
You can also check out the repo https://github.com/yoyololicon/duet-svs-diffusion. We used the 1D UNet from https://github.com/archinetai/audio-diffusion-pytorch as a denoiser (which is stronger than the noncausal wavenet of diffwave) and trained it on 8 singing voice datasets (including OpenSinger). We also made the checkpoint available.
I hope this helps.
Hello, I'm trying to train a model on the Opensinger dataset, but the Loss_T keeps going up and the resulting speech is almost incomprehensible, do you have any suggestions for that? The configurations I'm using are all default, just batch_size = 8 and LR reduced by half. Thank you so much!