yang-jin-hai / SAIN

[AAAI-2023] Self-Asymmetric Invertible Network for Compression-Aware Image Rescaling
Apache License 2.0
33 stars 2 forks source link

Questions about the analysis of Dual-IRN in the introduction of the paper #5

Closed ROTTK closed 7 months ago

ROTTK commented 7 months ago

Thank you very much for the open source code, it's a very interesting work! Regarding the analysis of Dual-IRN in the introduction of the paper, I have some doubts about how D-IRN and U-IRN are trained: Are D-IRN and U-IRN trained separately, are the high frequency components forced to follow the standard Gaussian distribution when D-IRN is trained, and is the output "high-quality LR" the LR image obtained by fitting the HR image with bicubic downsampling, and does the training objective of D-IRN include both HR image reconstruction loss and high quality LR image fitting loss? And how is the U-IRN trained, does the training objective of the U-IRN include both HR image reconstruction loss and compressed LR image fitting loss? How are the high frequency component inputs to the inverse process of the U-IRN obtained during training? Is it still sampled from the standard Gaussian distribution? Or is it sampled from a Gaussian mixture model?

yang-jin-hai commented 7 months ago

Thank you for your appreciation.

Therefore,

Are D-IRN and U-IRN trained separately, ❎

are the high frequency components forced to follow the standard Gaussian distribution when D-IRN is trained ✅

is the output "high-quality LR" the LR image obtained by fitting the HR image with bicubic downsampling ✅

does the training objective of D-IRN include both HR image reconstruction loss and high quality LR image fitting loss? ✅

And how is the U-IRN trained, does the training objective of the U-IRN include both HR image reconstruction loss and compressed LR image fitting loss? ✅

How are the high frequency component inputs to the inverse process of the U-IRN obtained during training? Is it still sampled from the standard Gaussian distribution? ✅

ROTTK commented 7 months ago

Thank you for your reply! As you mentioned in your reply, the high-frequency component of the inverse process of the input U-IRN, i.e., the latent variable, is sampled from the standard Gaussian distribution, so why is it observed that the latent variable of the U-IRN does not follow the standard Gaussian distribution? What does it mean that "the final outputs of D-IRN are highly similar to the anterior-layer features of the U-IRN"? Forgive me for not understanding the analysis in the introduction, but could you please tell me more about the process of observing this insight? I am very interested in this part! Thanks again for your patience!

yang-jin-hai commented 7 months ago

We are sorry that the introduction is a bit brief and unclear.

ROTTK commented 7 months ago

Thank you very much for your detailed answer!