phlippe / uvadlc_notebooks

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2023
https://uvadlc-notebooks.readthedocs.io/en/latest/
MIT License
2.59k stars 590 forks source link

Tutorial 11 : Dequantization and quantization process #57

Closed sy-eng closed 2 years ago

sy-eng commented 2 years ago

Thank you for your great tutorials!

I'm tring tutorial 11 and have 2 questions on the dequatization and the quantization process (codes in 6th - 8th cells).

I found 3 ldj updates for dequantization process and 2 for quantization process. I guess this means the quantization process is not theoritically invert of the dequantization process. This is because scaling to prevent boundaries 0 and 1 for the dequantization process in sigmoid function in the 6th cell.

z = z (1 - self.alpha) + 0.5 self.alpha

I add codes in sigmoid function for quantization process:

ldj -= np.log(1 - self.alpha) np.prod(z.shape[1:]) z = (z - 0.5 self.alpha) / (1 - self.alpha)

With these code, the test succeeded.

Smaller values(z < self.alpha) can also be shifted to z = self.alpha, I guess. This does not require ldj update. And, of course, because the test fail is not serious and ldj update is very small, we can ignore this.

The area of -0.5 < z < 0.5 is larger than 1.5, I guess. It means all area is much larger than 1. And I found plotted "prob" array in 8th cell is [1, 1, ..., 1]. In the cell, prior array is assumed 1 for each value means uniform distribution. So, the "plot" array should be normalized before muliplied by the "prob" array:

prob = prob * prior[out] / quants

Theoritically, the figure shows prob = e^{-z}/(1+e^{-z})^2. The output of the modification looks like e^{-z}/(1+e^{-z})^2.

Thank you.

phlippe commented 2 years ago

Hi, thanks for your comments!

(1) That's very true, thank you for pointing this out. I think I initially didn't add it in the reverse since this can cause z to be potentially smaller than 0 or larger than 1, but since we clamp it into the necessary range during dequantization, it should be ok. I will add the inverting of this scaling constant to make the reverse forward more accurate. I'll also update the comment after this cell.

(2) The cell 8 contains the following line:

prior = prior / prior.sum() * quants # In the following, we assume 1 for each value means uniform distribution

I think I had it initially in there to just clarify that each section has an area of 1, which is easier to compare to than the true probabilities. However, I see how this can be confusing and I'll remove the line so that the overall area is 1 again.

The newest commit should fix these things. Please let me know if you find any other issues!

sy-eng commented 2 years ago

Thank you very much!

I'll remove the line so that the overall area is 1 again.

Sometimes prior.sum() may not be 1 by mistake. So, I think the value of prior.sum() should be checked or normalized:

prior = prior / prior.sum()

Only "* quants" should be removed. Of course, the newest commit is OK!

I think I initially didn't add it in the reverse since this can cause z to be potentially smaller than 0 or larger than 1, but since we clamp it into the necessary range during dequantization, it should be ok.

Handling special cases or exceptions is very difficult...

Anyway, thank you for the newest commit! And have a nice weekend!

phlippe commented 2 years ago

The name 'prior' should probably make clear that we are looking for a normalized distribution, but I can see why errors might occur there when people want to try new distributions. So, I added the line back with the quants factor. Thanks!

sy-eng commented 2 years ago

Thank you!