openai / glow

Code for reproducing results in "Glow: Generative Flow with Invertible 1x1 Convolutions"
https://arxiv.org/abs/1807.03039
MIT License
3.11k stars 516 forks source link

Why do you add this uniform noise to the inputs? #43

Closed hq-liu closed 6 years ago

hq-liu commented 6 years ago

https://github.com/openai/glow/blob/654ddd0ddd976526824455074aa1eaaa92d095d8/model.py#L171

I find that https://github.com/taesung89/real-nvp/blob/5ec7a22bbae529e44d60bd6664a7753ae6772dfa/real_nvp/model.py#L42-L47 refers to corrupting data. However, I did not find this step in the GLOW paper. Is it necessary to add this noise? Will it have any impact on the result? Thanks vary much!

NTT123 commented 6 years ago

This is a trick used to convert discrete values of color (e.g., 0..255) to continuous values. It was mentioned in the experiment section of the paper NICE: Non-linear Independent Components Estimation.

lxuechen commented 6 years ago

Hi, I'm having a related question here except it has to do with the line below.

https://github.com/openai/glow/blob/654ddd0ddd976526824455074aa1eaaa92d095d8/model.py#L172

Why do we add the total number of bits/nats needed to encode an image before all the invertible transformations? Thanks in advance!

lxuechen commented 6 years ago

Ok, so by looking at the real NVP code, I think I figured this out. This is to account for division by 256, which is also an invertible transformation and has that term above as its logdet of Jacobian.

NTT123 commented 6 years ago

@lxuechen I have another explanation.

Without adding - np.log(hps.n_bins) * np.prod(Z.int_shape(z)[1:]), objective would be the log probability density of a real-valued image. However, what we actually want to compute is the probability mass (not density) of a discrete-valued image. To do that, we need to integrate all real-valued images whose discretization is our discrete image. We approximate this integral (i.e. probability) by probability density at one point x volume, which is the formula in line 172.

P.S. This paper https://arxiv.org/abs/1511.01844 (Section 3.1) confirms my explanation.