Closed mjack3 closed 9 months ago
Hi, sure happy to expand on it. The Jacobian you are looking at there comes from the sigmoid $\sigma(z)$ in the next line. Essentially, we need to calculate: $$ldj = \log \frac{\partial}{\partial z} \sigma(z)$$ The derivative of the sigmoid is commonly known as $$\frac{\partial}{\partial z} \sigma(z)=\sigma(z)(1-\sigma(z))$$ (see e.g. https://towardsdatascience.com/derivative-of-the-sigmoid-function-536880cf918e for the steps). Now we can plug it into the ldj equation and solve the log: $$ldj = \log \sigma(z)(1-\sigma(z)) = \log \sigma(z) + \log (1 - \sigma(z))$$ $$\log \sigma(z) = \log \frac{1}{1+\exp(-z)}=-\log(1+\exp(-z))$$ $$\log (1 - \sigma(z)) = \log \frac{\exp(-z)}{1+\exp(-z)} = -z - \log(1+\exp(-z))$$ Combining them gives us: $$ldj = -z - 2 \cdot \log(1 + \exp(-z))$$ The second part is also known as the softplus function (https://pytorch.org/docs/stable/generated/torch.nn.Softplus.html), and PyTorch provides a numerical stable version of it. Thus, our final ldj becomes: $$ldj = -z - 2 \cdot \text{softplus}(-z)$$ This is what we implement, and sum over all dimensions except the batch since we apply the sigmoid to all elements in the image.
Hope that helps, let me know if something is unclear. :)
Oh, thanks! And could you explain this operation?
ldj -= np.log(self.quants) * np.prod(z.shape[1:])
I know you're subtracting the ldj because you're dividing z by 256. Right? but why do you use the product?
Correct, the np.log(self.quants)
is because of the division. We do this division for every element in the batch, which is for an image height * width * channels
. This is why we take the product over these axes.
You can also imagine that we would have a tensor of size [batch, channels, height, width]
, all with values np.log(self.quants)
for the division. Then we sum over the last three axes, as in the previous ldj calculation. This is equivalent to the product
Oh! I see thank you very much! And to end this questionnaire, please could you extend on this operations:
z = z * (1 - self.alpha) + 0.5 * self.alpha # Scale to prevent boundaries 0 and 1
ldj += np.log(1 - self.alpha) * np.prod(z.shape[1:])
I understand why we need to avoid limits. But, when calculating the Jacobian in such a transformation I wonder why only the (1 - self.alpha)
is considered.
Thank you so much!
The Jacobian is based on the derivative of the transformation. The term 0.5 * self.alpha
represents here an additive constant, so it does not influence the derivative. In other words, you have:
$$ldj = \log \frac{\partial}{\partial z} (a\cdot z + b)=\log \frac{\partial}{\partial z} a\cdot z = \log a$$
Thanks! can close the issue =)
Tutorial: -11
Describe the bug Could you explain the origin of the numbers in the Jacobian determinant for Dequantization and Variational Dequantization?
Cell 6:
Specially the case for
ldj += (-z-2*F.softplus(-z)).sum(dim=[1,2,3])
I have basic knowledge of Jacobian (I studied computer science). The tutorial is great, but I would appreciate a little introduction to why the Jacobian is handled this way.
Thanks in advance