jaywalnut310 / glow-tts

A Generative Flow for Text-to-Speech via Monotonic Alignment Search
MIT License
667 stars 150 forks source link

Log det Jacobian is wrong in Inv1x1Conv #17

Closed MokkeMeguru closed 4 years ago

MokkeMeguru commented 4 years ago

image

So, this log det jacobian is torch.slogdet(self.weight)

https://github.com/jaywalnut310/glow-tts/blob/00a482d06ebbffbd3518a43480cd79e7b47ebbe2/modules.py#L228

jaywalnut310 commented 4 years ago

slogdet is exactly the right function, but if the det of the weight is always positive, logdet is also okay .

I initialized the det of the weight to be positive as follows: https://github.com/jaywalnut310/glow-tts/blob/00a482d06ebbffbd3518a43480cd79e7b47ebbe2/modules.py#L200-L203

Because the training process (maximizing log likelihood of data) encourages the determinant to increase, the det of the weight tends to maintain positivity.

The main reason I prefer logdet is that it's easier and more stable to calculate than slogdet. Actually, I followed the implementation of WaveGlow: https://github.com/NVIDIA/waveglow/blob/master/glow.py#L76-L80 https://github.com/NVIDIA/waveglow/blob/d18e0f3cc2ff6bdd41244d7391140accdc41142b/glow.py#L100

MokkeMeguru commented 4 years ago

Thanks your information ! I didn't know positive weight initialization because I didn't see WaveGlow's implementation.

MokkeMeguru commented 4 years ago

Can you show me the source reference that slogdet is more stable than logdet ?

jaywalnut310 commented 4 years ago

The main reason I prefer logdet is that it's easier and more stable to calculate than slogdet.

I misunderstood that logdet is easier to calculate than slogdet, because logdet of positive definite matrices is easily formulated. For arbitrary matrices, the two functions are implemented similarly except the sign info. Sorry for my misunderstanding, @MokkeMeguru.

However, for checking training stability, logdet can be a good tool. Although the determinant of weights are initialized to be positive and the training process encourages the determinant to increase, the determinant can fluctuate positive/negative across zero due to some bad training configurations. In that case, logdet would make errors that alert you did something wrong, while slogdet wouldn't.

rafaelvalle's mentions in waveglow issues would be helpful: https://github.com/NVIDIA/waveglow/issues/49#issuecomment-442522418 https://github.com/NVIDIA/waveglow/issues/35#issuecomment-442523005

MokkeMeguru commented 4 years ago
  1. slogdet vs logdet Oh... PyTorch uses the same formulation. In Tensorflow, the logdet uses Choreskey method, but the slogdet uses LU decomposition. There is a complex issue...

  2. logdet's flipping We know log_det_jacobian term is not same learning curve compared with negative log-likelihood. (Here is my glow's training with Oxford 102 flower dataset.) So I wonder it is not correct that log_det_jacobian is always increasing... (but this is not your code and problem settings, I will monitored in your problem.)

image image

Anyway, thanks for your help and reply.