miyosuda / scan

SCAN: Learning Abstract Hierarchical Compositional Visual Concepts
Other
55 stars 7 forks source link

About the loss setting in beta-VAE #1

Open simplespy opened 6 years ago

simplespy commented 6 years ago

Hello! I have a question about the loss setting in beta-VAE.

In the MODEL ARCHITECTURE part of the paper, the authors said that they "replace the pixel log-likelihood term in Eq. 2 with an L2 loss in the high-level feature space of DAE", as is implemented in your model. [loss = L2(z_d - z_out_d) + beta * KL]

However, in the MODEL DETAILS part, they also said that "The reconstruction error was taking in the last layer of the DAE (in the pixel space of DAE reconstructions) using L2 loss and before the non-linearity." It seems that the loss should be [loss = L2(x_d - x_out_d) + beta * KL]

I'm wondering which is right and why they are inconsistent. Because with pre-trained DAE, in the course of training beta-VAE, I find these two terms of loss didn't work well (the reconstr-loss is much larger than latent-loss).

Look forward to any reply. Thanks a lot!

miyosuda commented 6 years ago

However, in the MODEL DETAILS part, they also said that "The reconstruction error was taking in the last layer of the DAE (in the pixel space of DAE reconstructions) using L2 loss and before the non-linearity." It seems that the loss should be [loss = L2(x_d - x_out_d) + beta * KL]

Yes, I'm not following the paper, and I'm calculating loss with DAE bottleneck z. I also tried

[loss = L2(x_d - x_out_d) + beta * KL]

this loss too, (but I calculated loss with output before sigmoid activation).

However I didn't get so much difference with the result. So I'm using

[loss = L2(z_d - z_out_d) + beta * KL]

this loss calculation.

And I'm using beta=0.5, while original paper uses beta=53.0

https://github.com/miyosuda/scan/blob/cc86131f81dec386ac47ee7f2ec9705e032b468d/options.py#L15

this means that they are not calculating loss with bottleneck z like mine, I think.

And one more thing I need to mention is that when visualizing the output of the VAE, I pass it through DAE.

https://github.com/miyosuda/scan/blob/cc86131f81dec386ac47ee7f2ec9705e032b468d/model.py#L359-L364

This is because the output of the VAE itself is too noisy.

We are calculating reconstruction loss with DAE, but DAE can output the same result even if the input has noise. So the reconstruction loss becomes zero even if VAE output contains noise. So the VAE output itself is inherently noisy, I think.

So I'm passing the VAE output into DAE to clean it up to visualize.

simplespy commented 6 years ago

You are right. I tried the original loss and there is no obvious differences. It may due to the convergence of DAE.

And the second point you mentioned is really helpful!!! I implemented this model with Pytorch so I only looked over your overall architectures in comparison to descriptions in the paper and didn't noticed this 'through_dae' term. As you said the output is noisy and this operation really makes sense.

Thanks again~