lucidrains / naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
MIT License
1.26k stars 100 forks source link

Is KL-Divergence loss missing in Aligner loss definition? #29

Closed wilson97 closed 11 months ago

wilson97 commented 12 months ago

I was looking through the aligner code. In this paper: https://arxiv.org/pdf/2108.10447.pdf the NAR aligner loss has two terms: forward sum and KL divergence between soft and hard alignments. However in the code I only see the forward sum loss. Is there a reason for this, or am I missing something? Thanks.

lucidrains commented 12 months ago

@wilson97 @manmay-nakhashi Manmay actually pull requested that implementation, he would know better!

manmay-nakhashi commented 11 months ago

@wilson97 tried using Lbin mentioned in paper , but after using that convergence is slow , we can use it after some steps because it regularize alignment training.

lucidrains commented 11 months ago

@manmay-nakhashi maybe the loss can be added for completeness sake and defaulted to off? (loss weight of 0)

manmay-nakhashi commented 11 months ago

@lucidrains sure I'll add it.

lucidrains commented 11 months ago

@wilson97 @manmay-nakhashi hey, I had some time this morning and knocked out the loss

do you want to do a quick code review and see if it lines up with what you'd expect reading the paper

wilson97 commented 11 months ago

sure I can take a look

lucidrains commented 11 months ago

think this should be fixed