pkuxmq / Invertible-Image-Rescaling

[ECCV 2020, IJCV 2022] Invertible Image Rescaling
Apache License 2.0
630 stars 86 forks source link

Where is the implementation for stage 2 with full distribution matching loss `L_distr` #16

Open JingyunLiang opened 4 years ago

JingyunLiang commented 4 years ago

The given configs (e.g. train_IRN_x4.yml) seem to be the stage 1 (pre-training stage).

pkuxmq commented 4 years ago

train_IRN+_x4.yml

JingyunLiang commented 4 years ago

Sorry, in train_IRN+_x4.yml, I only found pixel_criterion_forw, pixel_criterion_back, feature_criterion and gan. Also in models/IRNP_model.py, I only found above four kind of losses are used for optimization. I thought that train_IRN+_x4.yml is for generating visually pleasant images.

As in Eq. 10, there is an extra distribution loss (defined in Eq. 9). In page 10, you stated that After the pre-training stage, we restore the full distribution matching loss L_distr (stage 2) in the objective in place of L'_distr (stage 1).

Could you please tell me where is the code of L_distr in the second stage? Thank you.

pkuxmq commented 4 years ago

We employ the JS divergence as the probability metric for distribution matching (Eq. 7). Following GAN literatures, we implement JS divergence in the adversarial setting where the function T() is regarded as a discriminator. The gan loss in the code is the full distribution matching loss.

JingyunLiang commented 4 years ago

Thanks for your quick reply. I read the paragraphs about losses, but I am still confused:

1, besides the latent variable z which has a prior, there exists y in IRN model which is subject to some distributional constraint. The z follows the same distribution, i.e., N(0,I), as GAN, while y is constrained to be similar to y_bicubic with the L_guide. They are concatenated to be the input of the network. Am I right? What is the main difference between this way and conditional GAN except that the Generator is a bijector?

2, our model does not have a standalone distribution on x. What does this mean? Does it mean that the model has no assumption of p(x)?

3, the conventional way to use adversarial loss simply cannot be applied. What are the differences in the implementation of GAN loss compared with ESRGAN?

4, match towards the data distribution with an essentially different distribution from the GAN model distribution. From the code, it seems that this model still tries to discriminate x_real and x_fake, which means that it is matching towards the distribution p(x_real).

5, From my understanding, both JS divergence JS(p(x_real), p(x_fake)) and GAN try to minimize the difference between p(x_real) and p(x_fake). In some senses, GAN can be derived based on JS divergence. I understand that this model is different from previous NF because the objective is not MLE of z anymore, but I don't get the point of introducing JS divergence here.

Thank you ahead of time for answering my questions.

pkuxmq commented 4 years ago
  1. Conditional GAN transforms z to a conditioned distribution p(x|y), in which conditions y are given. While in our model, the distribution of y are not fixed, but learned to be generated by the model following some constraints as well. Basically our method jointly models image downscaling and upscaling, rather than only do inverse generation with conditions from a fixed distribution.

  2. It means the distribution of x in the inverse procedure should depend on y=f^y (x), or say (x, y) should follow a joint distribution.

3&4. ESRGAN transforms the distribution of LR images to the distribution of HR images and projects each LR image to one HR image point, while our model, in the inverse procedure, transforms the distribution of latent variable z (combined with each LR image y) to the distribution p(x|y=f^y (x)), which models the lost information between HR and LR images. Therefore the adversarial loss plays different roles in principle: in ESRGAN, it encourages the generated point for each input point to lie on the real image manifold (also hold for conventional GAN distribution), while in our model, it encourages the generated distribution of p(x|y=f^y (x)) for each input point y=f^y (x) to follow our target distribution, i.e. real image manifold around the HR image. So the distributions are essentially different.

  1. We introduce JS divergence to realize distribution matching and implement it in the form of a kind of gan loss.