phlippe / uvadlc_notebooks

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2023
https://uvadlc-notebooks.readthedocs.io/en/latest/
MIT License
2.59k stars 590 forks source link

[Question] Tutorial 11, Variational dequantization and log jacobian calculation #135

Closed mjack3 closed 9 months ago

mjack3 commented 9 months ago

Tutorial: -11

Describe the bug Could you explain the origin of the numbers in the Jacobian determinant for Dequantization and Variational Dequantization?

Cell 6:

class Dequantization(nn.Module):

    def __init__(self, alpha=1e-5, quants=256):
        """
        Inputs:
            alpha - small constant that is used to scale the original input.
                    Prevents dealing with values very close to 0 and 1 when inverting the sigmoid
            quants - Number of possible discrete values (usually 256 for 8-bit image)
        """
        super().__init__()
        self.alpha = alpha
        self.quants = quants

    def forward(self, z, ldj, reverse=False):
        if not reverse:
            z, ldj = self.dequant(z, ldj)
            z, ldj = self.sigmoid(z, ldj, reverse=True)
        else:
            z, ldj = self.sigmoid(z, ldj, reverse=False)
            z = z * self.quants
            ldj += np.log(self.quants) * np.prod(z.shape[1:])
            z = torch.floor(z).clamp(min=0, max=self.quants-1).to(torch.int32)
        return z, ldj

    def sigmoid(self, z, ldj, reverse=False):
        # Applies an invertible sigmoid transformation
        if not reverse:
            ldj += (-z-2*F.softplus(-z)).sum(dim=[1,2,3])
            z = torch.sigmoid(z)
            # Reversing scaling for numerical stability
            ldj -= np.log(1 - self.alpha) * np.prod(z.shape[1:])
            z = (z - 0.5 * self.alpha) / (1 - self.alpha)
        else:
            z = z * (1 - self.alpha) + 0.5 * self.alpha  # Scale to prevent boundaries 0 and 1
            ldj += np.log(1 - self.alpha) * np.prod(z.shape[1:])
            ldj += (-torch.log(z) - torch.log(1-z)).sum(dim=[1,2,3])
            z = torch.log(z) - torch.log(1-z)
        return z, ldj

    def dequant(self, z, ldj):
        # Transform discrete values to continuous volumes
        z = z.to(torch.float32)
        z = z + torch.rand_like(z).detach()
        z = z / self.quants
        ldj -= np.log(self.quants) * np.prod(z.shape[1:])
        return z, ldj

Specially the case for

ldj += (-z-2*F.softplus(-z)).sum(dim=[1,2,3])

I have basic knowledge of Jacobian (I studied computer science). The tutorial is great, but I would appreciate a little introduction to why the Jacobian is handled this way.

Thanks in advance

phlippe commented 9 months ago

Hi, sure happy to expand on it. The Jacobian you are looking at there comes from the sigmoid $\sigma(z)$ in the next line. Essentially, we need to calculate: $$ldj = \log \frac{\partial}{\partial z} \sigma(z)$$ The derivative of the sigmoid is commonly known as $$\frac{\partial}{\partial z} \sigma(z)=\sigma(z)(1-\sigma(z))$$ (see e.g. https://towardsdatascience.com/derivative-of-the-sigmoid-function-536880cf918e for the steps). Now we can plug it into the ldj equation and solve the log: $$ldj = \log \sigma(z)(1-\sigma(z)) = \log \sigma(z) + \log (1 - \sigma(z))$$ $$\log \sigma(z) = \log \frac{1}{1+\exp(-z)}=-\log(1+\exp(-z))$$ $$\log (1 - \sigma(z)) = \log \frac{\exp(-z)}{1+\exp(-z)} = -z - \log(1+\exp(-z))$$ Combining them gives us: $$ldj = -z - 2 \cdot \log(1 + \exp(-z))$$ The second part is also known as the softplus function (https://pytorch.org/docs/stable/generated/torch.nn.Softplus.html), and PyTorch provides a numerical stable version of it. Thus, our final ldj becomes: $$ldj = -z - 2 \cdot \text{softplus}(-z)$$ This is what we implement, and sum over all dimensions except the batch since we apply the sigmoid to all elements in the image.

Hope that helps, let me know if something is unclear. :)

mjack3 commented 9 months ago

Oh, thanks! And could you explain this operation?

ldj -= np.log(self.quants) * np.prod(z.shape[1:])

I know you're subtracting the ldj because you're dividing z by 256. Right? but why do you use the product?

phlippe commented 9 months ago

Correct, the np.log(self.quants) is because of the division. We do this division for every element in the batch, which is for an image height * width * channels. This is why we take the product over these axes. You can also imagine that we would have a tensor of size [batch, channels, height, width], all with values np.log(self.quants) for the division. Then we sum over the last three axes, as in the previous ldj calculation. This is equivalent to the product

mjack3 commented 9 months ago

Oh! I see thank you very much! And to end this questionnaire, please could you extend on this operations:

z = z * (1 - self.alpha) + 0.5 * self.alpha  # Scale to prevent boundaries 0 and 1
ldj += np.log(1 - self.alpha) * np.prod(z.shape[1:])

I understand why we need to avoid limits. But, when calculating the Jacobian in such a transformation I wonder why only the (1 - self.alpha) is considered.

Thank you so much!

phlippe commented 9 months ago

The Jacobian is based on the derivative of the transformation. The term 0.5 * self.alpha represents here an additive constant, so it does not influence the derivative. In other words, you have: $$ldj = \log \frac{\partial}{\partial z} (a\cdot z + b)=\log \frac{\partial}{\partial z} a\cdot z = \log a$$

mjack3 commented 9 months ago

Thanks! can close the issue =)