Questions about the analysis of Dual-IRN in the introduction of the paper

yang-jin-hai / SAIN

[AAAI-2023] Self-Asymmetric Invertible Network for Compression-Aware Image Rescaling

Apache License 2.0

33 stars 2 forks source link

Questions about the analysis of Dual-IRN in the introduction of the paper #5

Closed ROTTK closed 7 months ago

ROTTK commented 7 months ago

Thank you very much for the open source code, it's a very interesting work! Regarding the analysis of Dual-IRN in the introduction of the paper, I have some doubts about how D-IRN and U-IRN are trained: Are D-IRN and U-IRN trained separately, are the high frequency components forced to follow the standard Gaussian distribution when D-IRN is trained, and is the output "high-quality LR" the LR image obtained by fitting the HR image with bicubic downsampling, and does the training objective of D-IRN include both HR image reconstruction loss and high quality LR image fitting loss? And how is the U-IRN trained, does the training objective of the U-IRN include both HR image reconstruction loss and compressed LR image fitting loss? How are the high frequency component inputs to the inverse process of the U-IRN obtained during training? Is it still sampled from the standard Gaussian distribution? Or is it sampled from a Gaussian mixture model?

yang-jin-hai commented 7 months ago

Thank you for your appreciation.

D-IRN and U-IRN are trained jointly with a differentiable virtual codec. Their structure and loss functions are the same as the original IRN, but the LR references are different.

Therefore,

Are D-IRN and U-IRN trained separately, ❎

are the high frequency components forced to follow the standard Gaussian distribution when D-IRN is trained ✅

is the output "high-quality LR" the LR image obtained by fitting the HR image with bicubic downsampling ✅

does the training objective of D-IRN include both HR image reconstruction loss and high quality LR image fitting loss? ✅

And how is the U-IRN trained, does the training objective of the U-IRN include both HR image reconstruction loss and compressed LR image fitting loss? ✅

How are the high frequency component inputs to the inverse process of the U-IRN obtained during training? Is it still sampled from the standard Gaussian distribution? ✅

ROTTK commented 7 months ago

Thank you for your reply! As you mentioned in your reply, the high-frequency component of the inverse process of the input U-IRN, i.e., the latent variable, is sampled from the standard Gaussian distribution, so why is it observed that the latent variable of the U-IRN does not follow the standard Gaussian distribution? What does it mean that "the final outputs of D-IRN are highly similar to the anterior-layer features of the U-IRN"? Forgive me for not understanding the analysis in the introduction, but could you please tell me more about the process of observing this insight? I am very interested in this part! Thanks again for your patience!

yang-jin-hai commented 7 months ago

We are sorry that the introduction is a bit brief and unclear.

The HF components are forced to follow the standard Gaussian during training and also sampled from standard Gaussian during backward (upscaling) testing, but the resulting distribution of the output HF components during forward (downscaling) testing appears more like a Gaussian mixture.
"the final outputs of D-IRN are highly similar to the anterior-layer features of the U-IRN" is another observation. We calculate the CKA similarity of the output features between different layers of U-IRN and D-IRN.

ROTTK commented 7 months ago

Thank you very much for your detailed answer!