HW2 autoregressive flow for images solution issue

rll / deepul

766 stars 374 forks source link

HW2 autoregressive flow for images solution issue #13

Open patrickmineault opened 3 years ago

patrickmineault commented 3 years ago

I'm pretty sure the log-likelihood for the solution to the second exercise is off. The nll is defined as:

  def nll(self, x, cond=None):
    loc, log_scale, weight_logits = torch.chunk(self.forward(x), 3, dim=1)
    weights = F.softmax(weight_logits, dim=1) #.repeat(1, 1, self.n_components, 1, 1)
    log_det_jacobian = Normal(loc, log_scale.exp()).log_prob(x.unsqueeze(1).repeat(1,1,self.n_components,1,1))
    return -log_det_jacobian.mean()

As you can see, the weights are never used. I believe this should be:

  def nll(self, x, cond=None):
    loc, log_scale, weight_logits = torch.chunk(self.forward(x), 3, dim=1)
    weights = F.softmax(weight_logits, dim=1) #.repeat(1, 1, self.n_components, 1, 1)
    log_det_jacobian = Normal(loc, log_scale.exp()).log_prob(x.unsqueeze(1).repeat(1,1,self.n_components,1,1)).exp()
    return -torch.log((log_det_jacobian * weights).sum(dim=2)).mean()

blahBlahhhJ commented 2 years ago

Also for this question. I'm wondering why in the sample method, the solution populates the samples using normal distributed random variables (which could range from -inf to inf) while the documentation says the sample should be "a numpy array of size (100, H, W, 1) of samples with values in [0, 1], where [0,0.5] represents a black pixel and [0.5,1] represents a white pixel".

samples[:, k, r, c] = torch.normal(loc[torch.arange(n), chosen_centers], log_scale[torch.arange(n), chosen_centers].exp())

TwilightSpar commented 2 years ago

I am also confused by this part. I think the weighted sum version is more reasonable. I still have a question about this nll function. If my understanding is right, the latent variable here, is from a mixture of gaussians right? $z \sim MoG(\vec\mu, \vec\sigma, \vec weight)$. And the definition of likihood of flow model: $$NLL= \mathbb{E}_x [ -log pz(z) - log |det J| ]$$ The value that we calculate in nll function (add the weighted sum part, like you said)_ is the first part $-log p_z(z)$ right? log(Normal(loc, log_scale.exp()).log_prob(x.unsqueeze(1).repeat(1,1,self.n_components,1,1)).exp() * weights)

what about the second part $- log |det J|$? What's more, In this case, the network that we are using is PixelCNN, which is a complex flow. Is there really is a way to calculate the second part $- log |det J|$? Is this 'pixelCNN' flow even invertible?

I have thought about this for days, Thanks guys!

TwilightSpar commented 2 years ago

Also for this question. I'm wondering why in the sample method, the solution populates the samples using normal distributed random variables (which could range from -inf to inf) while the documentation says the sample should be "a numpy array of size (100, H, W, 1) of samples with values in [0, 1], where [0,0.5] represents a black pixel and [0.5,1] represents a white pixel".
samples[:, k, r, c] = torch.normal(loc[torch.arange(n), chosen_centers], log_scale[torch.arange(n), chosen_centers].exp())

I checked their q2_save_results function in helper2 class. It seems they use the clip function, only keep values in [0,2] range. samples = np.clip(samples.astype('float') * 2.0, 0, 1.9999)